[RFC/PATCHv7 02/22] Add a script to edit/inspect notes

Previous thread: [RFC/PATCHv7 06/22] fast-import: Add support for importing commit notes by Johan Herland on Friday, October 9, 2009 - 3:22 am. (1 message)

Next thread: [PATCH] git-svn: hide find_parent_branch output in double quiet mode by Simon Arlott on Friday, October 9, 2009 - 5:21 am. (3 messages)
From: Johan Herland
Date: Friday, October 9, 2009 - 3:21 am

Hi,

(Feel free to put this on hold until v1.6.5 is released. In any case,
I'm going to Berlin for the weekend, and don't expect to read much
email...)

Here is the 7th iteration of the git-notes series. Changes in this
iteration are as follows:

- Rebased onto current 'next'

- Patch 1: Include minor leak fix

- Patch 10: Rename free_commit_notes() to free_notes() (Notes are
  no longer bound to commits only, see patch 15 for details)

- Patch 12: Remove tests that are invalidated by concatenation code
  in patch 13.

Overall, I consider the 12 first patches fairly stable at this point.
There's also a slew of new patches, that has more of an RFC status:

- Patches 13-14: Concatenation of multiple notes annotating the same
  commit/object. This was originally suggested by mugwump many months
  ago, and the suggestion was re-iterated by Dscho. This change has a
  minor perfomance impact (see [1]), but I still think it's worth it.

- Patch 15: Allow notes to be attached to any object (not just commits).
  Rename get_commit_notes() to format_note() to reflect this change.

- Patch 16-19: Expand notes API in preparation for querying and
  manipulating notes from elsewhere in Git (see patch 22 for examples).

- Patch 20: Add a new notes_tree struct, and use it as the first
  parameter to all functions in the notes API. This allows API users to
  maintain their own (multiple, concurrent) notes trees (see patch 22
  for an example). We still have a default notes tree in notes.c as a
  fallback (when NULL is passed as to an API function).

- Patch 21: The default behaviour when there are multiple notes for a
  given object is to concatenate them. However, some callers (see patch
  22) want to tweak this behaviour. This patch defines a new function
  type: combine_notes_fn, for combining two notes that reference the
  same object. The notes API is then expanded to allow the caller to
  specify a suitable combine_notes_fn. For convenience, three simple
  combine_notes ...
From: Johan Herland
Date: Friday, October 9, 2009 - 3:21 am

From: Johannes Schindelin <Johannes.Schindelin@gmx.de>

The script 'git notes' allows you to edit and show commit notes, by
calling either

	git notes show <commit>

or

	git notes edit <commit>

This patch has been improved by the following contributions:
- Tor Arne Vestbø: fix printing of multi-line notes
- Michael J Gruber: test and handle empty notes gracefully
- Thomas Rast:
  - only clean up message file when editing
  - use GIT_EDITOR and core.editor over VISUAL/EDITOR
  - t3301: fix confusing quoting in test for valid notes ref
  - t3301: use test_must_fail instead of !
  - refuse to edit notes outside refs/notes/
- Junio C Hamano: tests: fix "export var=val"
- Christian Couder: documentation: fix 'linkgit' macro in "git-notes.txt"
- Johan Herland: minor cleanup and bugfixing in git-notes.sh (v2)

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Tor Arne Vestbø <tavestbo@trolltech.com>
Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net>
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 .gitignore                  |    1 +
 Documentation/git-notes.txt |   46 +++++++++++++++++
 Makefile                    |    1 +
 command-list.txt            |    1 +
 git-notes.sh                |   73 +++++++++++++++++++++++++++
 t/t3301-notes.sh            |  114 +++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 236 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/git-notes.txt
 create mode 100755 git-notes.sh
 create mode 100755 t/t3301-notes.sh

diff --git a/.gitignore b/.gitignore
index 51a37b1..cbafa64 100644
--- a/.gitignore
+++ b/.gitignore
@@ -86,6 +86,7 @@ git-mktag
 git-mktree
 git-name-rev
 git-mv
+git-notes
 git-pack-redundant
 git-pack-objects
 git-pack-refs
diff --git a/Documentation/git-notes.txt ...
From: Johan Herland
Date: Friday, October 9, 2009 - 3:22 am

Created by a simple refactoring of initialize_notes().

Also add a new 'flags' parameter, which is a bitwise combination of notes
initialization flags. For now, there is only one flag - NOTES_INIT_EMPTY -
which indicates that the notes tree should not auto-load the contents of
the given (or default) notes ref, but rather should leave the notes tree
initialized to an empty state. This will become useful in the future when
manipulating the notes tree through the notes API.

Signed-off-by: Johan Herland <johan@herland.net>
---
 notes.c |   27 ++++++++++++++++-----------
 notes.h |   20 ++++++++++++++++++++
 2 files changed, 36 insertions(+), 11 deletions(-)

diff --git a/notes.c b/notes.c
index 0f7082f..f2bacbb 100644
--- a/notes.c
+++ b/notes.c
@@ -339,13 +339,25 @@ static void load_subtree(struct leaf_node *subtree, struct int_node *node,
 	free(buf);
 }
 
-static void initialize_notes(const char *notes_ref_name)
+void init_notes(const char *notes_ref, int flags)
 {
 	unsigned char sha1[20], object_sha1[20];
 	unsigned mode;
 	struct leaf_node root_tree;
 
-	if (!notes_ref_name || read_ref(notes_ref_name, object_sha1) ||
+	assert(!initialized);
+	initialized = 1;
+
+	if (!notes_ref) {
+		const char *env = getenv(GIT_NOTES_REF_ENVIRONMENT);
+		if (env)
+			notes_ref = getenv(GIT_NOTES_REF_ENVIRONMENT);
+		else
+			notes_ref = GIT_NOTES_DEFAULT_REF;
+	}
+
+	if (flags & NOTES_INIT_EMPTY || !notes_ref ||
+	    read_ref(notes_ref, object_sha1) ||
 	    get_tree_entry(object_sha1, "", sha1, &mode))
 		return;
 
@@ -378,15 +390,8 @@ void format_note(const unsigned char *object_sha1, struct strbuf *sb,
 	unsigned long linelen, msglen;
 	enum object_type type;
 
-	if (!initialized) {
-		const char *env = getenv(GIT_NOTES_REF_ENVIRONMENT);
-		if (env)
-			notes_ref_name = getenv(GIT_NOTES_REF_ENVIRONMENT);
-		else if (!notes_ref_name)
-			notes_ref_name = GIT_NOTES_DEFAULT_REF;
-		initialize_notes(notes_ref_name);
-		initialized = 1;
-	}
+	if ...
From: Johan Herland
Date: Friday, October 9, 2009 - 3:22 am

This patch adds the following flags to get_commit_notes() for adjusting the
format of the produced note string:
- NOTES_SHOW_HEADER: Print "Notes:" line before the notes contents
- NOTES_INDENT: Indent notes contents by 4 spaces

Suggested-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johan Herland <johan@herland.net>
---
 notes.c  |    8 +++++---
 notes.h  |    5 ++++-
 pretty.c |    3 ++-
 3 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/notes.c b/notes.c
index 2b66723..b7d79e1 100644
--- a/notes.c
+++ b/notes.c
@@ -106,7 +106,7 @@ static unsigned char *lookup_notes(const unsigned char *commit_sha1)
 }
 
 void get_commit_notes(const struct commit *commit, struct strbuf *sb,
-		const char *output_encoding)
+		const char *output_encoding, int flags)
 {
 	static const char utf8[] = "utf-8";
 	unsigned char *sha1;
@@ -148,12 +148,14 @@ void get_commit_notes(const struct commit *commit, struct strbuf *sb,
 	if (msglen && msg[msglen - 1] == '\n')
 		msglen--;
 
-	strbuf_addstr(sb, "\nNotes:\n");
+	if (flags & NOTES_SHOW_HEADER)
+		strbuf_addstr(sb, "\nNotes:\n");
 
 	for (msg_p = msg; msg_p < msg + msglen; msg_p += linelen + 1) {
 		linelen = strchrnul(msg_p, '\n') - msg_p;
 
-		strbuf_addstr(sb, "    ");
+		if (flags & NOTES_INDENT)
+			strbuf_addstr(sb, "    ");
 		strbuf_add(sb, msg_p, linelen);
 		strbuf_addch(sb, '\n');
 	}
diff --git a/notes.h b/notes.h
index 79d21b6..7f3eed4 100644
--- a/notes.h
+++ b/notes.h
@@ -1,7 +1,10 @@
 #ifndef NOTES_H
 #define NOTES_H
 
+#define NOTES_SHOW_HEADER 1
+#define NOTES_INDENT 2
+
 void get_commit_notes(const struct commit *commit, struct strbuf *sb,
-		const char *output_encoding);
+		const char *output_encoding, int flags);
 
 #endif
diff --git a/pretty.c b/pretty.c
index e25db81..01eadd0 100644
--- a/pretty.c
+++ b/pretty.c
@@ -978,7 +978,8 @@ void pretty_print_commit(enum cmit_fmt fmt, const struct commit *commit,
 		strbuf_addch(sb, '\n');
 
 	if (fmt != ...
From: Johan Herland
Date: Friday, October 9, 2009 - 3:22 am

The "-m" and "-F" options are already the established method
(in both git-commit and git-tag) to specify a commit/tag message
without invoking the editor. This patch teaches "git notes edit"
to respect the same options for specifying a notes message without
invoking the editor.

Multiple "-m" and/or "-F" options are concatenated as separate
paragraphs.

The patch also updates the "git notes" documentation and adds
selftests for the new functionality. Unfortunately, the added
selftests include a couple of lines with trailing whitespace
(without these the test will fail). This may cause git to warn
about "whitespace errors".

This patch has been improved by the following contributions:
- Thomas Rast: fix trailing whitespace in t3301

Signed-off-by: Johan Herland <johan@herland.net>
---
 Documentation/git-notes.txt |   16 ++++++++++-
 git-notes.sh                |   64 +++++++++++++++++++++++++++++++++++++-----
 t/t3301-notes.sh            |   36 ++++++++++++++++++++++++
 3 files changed, 107 insertions(+), 9 deletions(-)

diff --git a/Documentation/git-notes.txt b/Documentation/git-notes.txt
index 7136016..94cceb1 100644
--- a/Documentation/git-notes.txt
+++ b/Documentation/git-notes.txt
@@ -8,7 +8,7 @@ git-notes - Add/inspect commit notes
 SYNOPSIS
 --------
 [verse]
-'git-notes' (edit | show) [commit]
+'git-notes' (edit [-F <file> | -m <msg>] | show) [commit]
 
 DESCRIPTION
 -----------
@@ -33,6 +33,20 @@ show::
 	Show the notes for a given commit (defaults to HEAD).
 
 
+OPTIONS
+-------
+-m <msg>::
+	Use the given note message (instead of prompting).
+	If multiple `-m` (or `-F`) options are given, their
+	values are concatenated as separate paragraphs.
+
+-F <file>::
+	Take the note message from the given file.  Use '-' to
+	read the note message from the standard input.
+	If multiple `-F` (or `-m`) options are given, their
+	values are concatenated as separate paragraphs.
+
+
 Author
 ------
 Written by Johannes Schindelin ...
From: Johan Herland
Date: Friday, October 9, 2009 - 3:22 am

Signed-off-by: Johan Herland <johan@herland.net>
---
 t/t3303-notes-subtrees.sh |  104 +++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 104 insertions(+), 0 deletions(-)
 create mode 100755 t/t3303-notes-subtrees.sh

diff --git a/t/t3303-notes-subtrees.sh b/t/t3303-notes-subtrees.sh
new file mode 100755
index 0000000..cbb9d35
--- /dev/null
+++ b/t/t3303-notes-subtrees.sh
@@ -0,0 +1,104 @@
+#!/bin/sh
+
+test_description='Test commit notes organized in subtrees'
+
+. ./test-lib.sh
+
+number_of_commits=100
+
+start_note_commit () {
+	test_tick &&
+	cat <<INPUT_END
+commit refs/notes/commits
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+data <<COMMIT
+notes
+COMMIT
+
+from refs/notes/commits^0
+deleteall
+INPUT_END
+
+}
+
+verify_notes () {
+	git log | grep "^    " > output &&
+	i=$number_of_commits &&
+	while [ $i -gt 0 ]; do
+		echo "    commit #$i" &&
+		echo "    note for commit #$i" &&
+		i=$(($i-1));
+	done > expect &&
+	test_cmp expect output
+}
+
+test_expect_success "setup: create $number_of_commits commits" '
+
+	(
+		nr=0 &&
+		while [ $nr -lt $number_of_commits ]; do
+			nr=$(($nr+1)) &&
+			test_tick &&
+			cat <<INPUT_END
+commit refs/heads/master
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+data <<COMMIT
+commit #$nr
+COMMIT
+
+M 644 inline file
+data <<EOF
+file in commit #$nr
+EOF
+
+INPUT_END
+
+		done &&
+		test_tick &&
+		cat <<INPUT_END
+commit refs/notes/commits
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+data <<COMMIT
+no notes
+COMMIT
+
+deleteall
+
+INPUT_END
+
+	) |
+	git fast-import --quiet &&
+	git config core.notesRef refs/notes/commits
+'
+
+test_sha1_based () {
+	(
+		start_note_commit &&
+		nr=$number_of_commits &&
+		git rev-list refs/heads/master |
+		while read sha1; do
+			note_path=$(echo "$sha1" | sed "$1")
+			cat <<INPUT_END &&
+M 100644 inline $note_path
+data <<EOF
+note for ...
From: Johan Herland
Date: Friday, October 9, 2009 - 3:22 am

Created by a simple cleanup and rename of lookup_notes().

Signed-off-by: Johan Herland <johan@herland.net>
---
 notes.c |   15 ++++++++-------
 notes.h |    3 +++
 2 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/notes.c b/notes.c
index 49a3e86..2196a5f 100644
--- a/notes.c
+++ b/notes.c
@@ -377,12 +377,13 @@ void add_note(const unsigned char *object_sha1, const unsigned char *note_sha1)
 	note_tree_insert(&root_node, 0, l, PTR_TYPE_NOTE);
 }
 
-static unsigned char *lookup_notes(const unsigned char *object_sha1)
+const unsigned char *get_note(const unsigned char *object_sha1)
 {
-	struct leaf_node *found = note_tree_find(&root_node, 0, object_sha1);
-	if (found)
-		return found->val_sha1;
-	return NULL;
+	struct leaf_node *found;
+
+	assert(initialized);
+	found = note_tree_find(&root_node, 0, object_sha1);
+	return found ? found->val_sha1 : NULL;
 }
 
 void free_notes(void)
@@ -396,7 +397,7 @@ void format_note(const unsigned char *object_sha1, struct strbuf *sb,
 		const char *output_encoding, int flags)
 {
 	static const char utf8[] = "utf-8";
-	unsigned char *sha1;
+	const unsigned char *sha1;
 	char *msg, *msg_p;
 	unsigned long linelen, msglen;
 	enum object_type type;
@@ -404,7 +405,7 @@ void format_note(const unsigned char *object_sha1, struct strbuf *sb,
 	if (!initialized)
 		init_notes(NULL, 0);
 
-	sha1 = lookup_notes(object_sha1);
+	sha1 = get_note(object_sha1);
 	if (!sha1)
 		return;
 
diff --git a/notes.h b/notes.h
index 5f22852..21a8930 100644
--- a/notes.h
+++ b/notes.h
@@ -25,6 +25,9 @@ void init_notes(const char *notes_ref, int flags);
 void add_note(const unsigned char *object_sha1,
 		const unsigned char *note_sha1);
 
+/* Get the note object SHA1 containing the note data for the given object */
+const unsigned char *get_note(const unsigned char *object_sha1);
+
 /* Free (and de-initialize) the internal notes tree structure */
 void free_notes(void);
 
-- 
1.6.4.304.g1365c.dirty

--

From: Johan Herland
Date: Friday, October 9, 2009 - 3:22 am

Signed-off-by: Johan Herland <johan@herland.net>
---
 notes.c |   11 +++++++++++
 notes.h |    4 ++++
 2 files changed, 15 insertions(+), 0 deletions(-)

diff --git a/notes.c b/notes.c
index f2bacbb..49a3e86 100644
--- a/notes.c
+++ b/notes.c
@@ -366,6 +366,17 @@ void init_notes(const char *notes_ref, int flags)
 	load_subtree(&root_tree, &root_node, 0);
 }
 
+void add_note(const unsigned char *object_sha1, const unsigned char *note_sha1)
+{
+	struct leaf_node *l;
+
+	assert(initialized);
+	l = (struct leaf_node *) xmalloc(sizeof(struct leaf_node));
+	hashcpy(l->key_sha1, object_sha1);
+	hashcpy(l->val_sha1, note_sha1);
+	note_tree_insert(&root_node, 0, l, PTR_TYPE_NOTE);
+}
+
 static unsigned char *lookup_notes(const unsigned char *object_sha1)
 {
 	struct leaf_node *found = note_tree_find(&root_node, 0, object_sha1);
diff --git a/notes.h b/notes.h
index 6b52799..5f22852 100644
--- a/notes.h
+++ b/notes.h
@@ -21,6 +21,10 @@
  */
 void init_notes(const char *notes_ref, int flags);
 
+/* Add the given note object to the internal notes tree structure */
+void add_note(const unsigned char *object_sha1,
+		const unsigned char *note_sha1);
+
 /* Free (and de-initialize) the internal notes tree structure */
 void free_notes(void);
 
-- 
1.6.4.304.g1365c.dirty

--

From: Johan Herland
Date: Friday, October 9, 2009 - 3:22 am

Currently, having multiple notes referring to the same commit from various
locations in the notes tree is strongly discouraged, since only one of those
notes will be parsed and shown.

This patch teaches the notes code to _concatenate_ multiple notes that
annotate the same commit. Notes are concatenated by creating a new blob
object containing the concatenation of the notes in question, and
replacing them with the concatenated note in the internal notes tree
structure.

Getting the concatenation right requires being more proactive in unpacking
subtree entries in the internal notes tree structure, so that we don't return
a note prematurely (i.e. before having found all other notes that annotate
the same object). As such, this patch may incur a small performance penalty.

Suggested-by: Sam Vilain <sam@vilain.net>
Re-suggested-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johan Herland <johan@herland.net>
---
 notes.c |  243 +++++++++++++++++++++++++++++++++++++++++---------------------
 1 files changed, 161 insertions(+), 82 deletions(-)

diff --git a/notes.c b/notes.c
index 210c4b2..50a4672 100644
--- a/notes.c
+++ b/notes.c
@@ -59,115 +59,196 @@ static void load_subtree(struct leaf_node *subtree, struct int_node *node,
 		unsigned int n);
 
 /*
- * To find a leaf_node:
+ * Search the tree until the appropriate location for the given key is found:
  * 1. Start at the root node, with n = 0
- * 2. Use the nth nibble of the key as an index into a:
- *    - If a[n] is an int_node, recurse into that node and increment n
- *    - If a leaf_node with matching key, return leaf_node (assert note entry)
+ * 2. If a[0] at the current level is a matching subtree entry, unpack that
+ *    subtree entry and remove it; restart search at the current level.
+ * 3. Use the nth nibble of the key as an index into a:
+ *    - If a[n] is an int_node, recurse from #2 into that node and increment n
  *    - If a matching subtree entry, unpack that subtree entry (and remove ...
From: Johan Herland
Date: Friday, October 9, 2009 - 3:22 am

This patch teaches 'git fast-import' to use the notes API to organize
the manipulation of note objects through a fast-import stream. Note
objects are added to the notes tree through the 'N' command, and when
we're about to store the tree object for the current commit, we walk
through the notes tree and insert all the notes into the stored tree.

Signed-off-by: Johan Herland <johan@herland.net>
---
 fast-import.c          |   98 ++++++++++++++++++++++++++++--
 t/t9300-fast-import.sh |  156 ++++++++++++++++++++++++++++++++++++++++++++----
 2 files changed, 235 insertions(+), 19 deletions(-)

diff --git a/fast-import.c b/fast-import.c
index fcdcfaa..5837875 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -156,6 +156,7 @@ Format of STDIN stream:
 #include "csum-file.h"
 #include "quote.h"
 #include "exec_cmd.h"
+#include "notes.h"
 
 #define PACK_ID_BITS 16
 #define MAX_PACK_ID ((1<<PACK_ID_BITS)-1)
@@ -246,6 +247,7 @@ struct branch
 	struct tree_entry branch_tree;
 	uintmax_t last_commit;
 	unsigned active : 1;
+	unsigned has_notes : 1;
 	unsigned pack_id : PACK_ID_BITS;
 	unsigned char sha1[20];
 };
@@ -277,6 +279,11 @@ struct recent_command
 	char *buf;
 };
 
+struct notes_tree_list {
+	struct notes_tree tree;
+	struct notes_tree_list *next;
+};
+
 /* Configured limits on output */
 static unsigned long max_depth = 10;
 static off_t max_packsize = (1LL << 32) - 1;
@@ -345,6 +352,9 @@ static struct branch *active_branches;
 static struct tag *first_tag;
 static struct tag *last_tag;
 
+/* Notes data */
+static struct notes_tree_list *notes_trees;
+
 /* Input stream parsing */
 static whenspec_type whenspec = WHENSPEC_RAW;
 static struct strbuf command_buf = STRBUF_INIT;
@@ -2060,7 +2070,7 @@ static void file_change_cr(struct branch *b, int rename)
 		leaf.tree);
 }
 
-static void note_change_n(struct branch *b)
+static void note_change_n(struct branch *b, struct notes_tree *notes)
 {
 	const char *p = command_buf.buf + 2;
 	static struct strbuf uq ...
From: Shawn O. Pearce
Date: Friday, October 9, 2009 - 7:25 am

Some high level comments about this patch:

- You don't destroy the struct notes_tree during unload_one_branch()
  which means notes trees stay in memory even if the branch table
  is overflowing.  I think you should discard the notes tree when
  a branch unloads, and recreate it when the branch loads.

- Destroying and adding back all notes is OK with ~20k notes, but
  doing that with ~150k-~800k notes is going to slow down a lot,
  losing the "fast" part.

-- 
Shawn.
--

From: Johan Herland
Date: Thursday, November 19, 2009 - 6:43 pm

Thanks for the comments. I've tried to address them in the 8th iteration of 
this series (Patch 8/10 to be more precise), just submitted to the mailing 
list.


...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net
--

From: Johan Herland
Date: Friday, October 9, 2009 - 3:22 am

When adding a note to an object that already has an existing note, the
current solution is to concatenate the contents of the two notes. However,
the caller may instead wish to _overwrite_ the existing note with the new
note, or maybe even _ignore_ the new note, and keep the existing one. There
might also be other ways of combining notes that are only known to the
caller.

Therefore, instead of unconditionally concatenating notes, we let the caller
specify how to combine notes, by passing in a pointer to a function for
combining notes. The caller may choose to implement its own function for
notes combining, but normally one of the following three conveniently
supplied notes combination functions will be sufficient:

- combine_notes_concatenate() combines the two notes by appending the
  contents of the new note to the contents of the existing note.

- combine_notes_overwrite() replaces the existing note with the new note.

- combine_notes_ignore() keeps the existing note, and ignores the new note.

A combine_notes function can be passed to init_notes() to choose a default
combine_notes function for that notes tree. If NULL is given, the notes tree
falls back to combine_notes_concatenate() as the ultimate default.

A combine_notes function can also be passed directly to add_note(), to
control the notes combining behaviour for a note addition in particular.
If NULL is passed, the combine_notes function registered for the given
notes tree is used.

Signed-off-by: Johan Herland <johan@herland.net>
---
 notes.c |  132 +++++++++++++++++++++++++++++++++++---------------------------
 notes.h |   34 +++++++++++++++-
 2 files changed, 106 insertions(+), 60 deletions(-)

diff --git a/notes.c b/notes.c
index a5d9736..19ae492 100644
--- a/notes.c
+++ b/notes.c
@@ -127,55 +127,12 @@ static struct leaf_node *note_tree_find(struct int_node *tree, unsigned char n,
 	return NULL;
 }
 
-/* Create a new blob object by concatenating the two given blob objects */
-static int ...
From: Johan Herland
Date: Friday, October 9, 2009 - 3:22 am

There is really no reason why only commit objects can be annotated. By
changing the struct commit parameter to get_commit_notes() into a sha1 we
gain the ability to annotate any object type. To reflect this in the function
naming as well, we rename get_commit_notes() to format_note().

This patch also fixes comments and variable names throughout notes.c as a
consequence of the removal of the unnecessary 'commit' restriction.

Signed-off-by: Johan Herland <johan@herland.net>
---
 notes.c  |   33 ++++++++++++++++-----------------
 notes.h  |   11 ++++++++++-
 pretty.c |    8 ++++----
 3 files changed, 30 insertions(+), 22 deletions(-)

diff --git a/notes.c b/notes.c
index 50a4672..0f7082f 100644
--- a/notes.c
+++ b/notes.c
@@ -1,5 +1,4 @@
 #include "cache.h"
-#include "commit.h"
 #include "notes.h"
 #include "refs.h"
 #include "utf8.h"
@@ -25,10 +24,10 @@ struct int_node {
 /*
  * Leaf nodes come in two variants, note entries and subtree entries,
  * distinguished by the LSb of the leaf node pointer (see above).
- * As a note entry, the key is the SHA1 of the referenced commit, and the
+ * As a note entry, the key is the SHA1 of the referenced object, and the
  * value is the SHA1 of the note object.
  * As a subtree entry, the key is the prefix SHA1 (w/trailing NULs) of the
- * referenced commit, using the last byte of the key to store the length of
+ * referenced object, using the last byte of the key to store the length of
  * the prefix. The value is the SHA1 of the tree object containing the notes
  * subtree.
  */
@@ -211,7 +210,7 @@ static void note_tree_insert(struct int_node *tree, unsigned char n,
 				if (concatenate_notes(l->val_sha1,
 						entry->val_sha1))
 					die("failed to concatenate note %s "
-					    "into note %s for commit %s",
+					    "into note %s for object %s",
 					    sha1_to_hex(entry->val_sha1),
 					    sha1_to_hex(l->val_sha1),
 					    sha1_to_hex(l->key_sha1));
@@ -299,7 +298,7 @@ static int get_sha1_hex_segment(const char ...
From: Johan Herland
Date: Friday, October 9, 2009 - 3:22 am

The new struct notes_tree encapsulates access to a specific notes tree.
It is provided to allow callers to interface with several different notes
trees simultaneously.

A struct notes_tree * parameter is added to every function in the notes API.
In all cases, NULL can be passed, in which case, a falback "default" notes
tree (declared in notes.c) is used.

Signed-off-by: Johan Herland <johan@herland.net>
---
 notes.c  |   67 ++++++++++++++++++++++++++++++++++++++-----------------------
 notes.h  |   57 +++++++++++++++++++++++++++++++++++++--------------
 pretty.c |    4 +-
 3 files changed, 85 insertions(+), 43 deletions(-)

diff --git a/notes.c b/notes.c
index 9581b98..a5d9736 100644
--- a/notes.c
+++ b/notes.c
@@ -50,9 +50,7 @@ struct leaf_node {
 #define SUBTREE_SHA1_PREFIXCMP(key_sha1, subtree_sha1) \
 	(memcmp(key_sha1, subtree_sha1, subtree_sha1[19]))
 
-static struct int_node root_node;
-
-static int initialized;
+static struct notes_tree default_tree;
 
 static void load_subtree(struct leaf_node *subtree, struct int_node *node,
 		unsigned int n);
@@ -434,14 +432,15 @@ redo:
 	return 0;
 }
 
-void init_notes(const char *notes_ref, int flags)
+void init_notes(struct notes_tree *t, const char *notes_ref, int flags)
 {
 	unsigned char sha1[20], object_sha1[20];
 	unsigned mode;
 	struct leaf_node root_tree;
 
-	assert(!initialized);
-	initialized = 1;
+	if (!t)
+		t = &default_tree;
+	assert(!t->initialized);
 
 	if (!notes_ref) {
 		const char *env = getenv(GIT_NOTES_REF_ENVIRONMENT);
@@ -451,6 +450,10 @@ void init_notes(const char *notes_ref, int flags)
 			notes_ref = GIT_NOTES_DEFAULT_REF;
 	}
 
+	t->root = (struct int_node *) xcalloc(sizeof(struct int_node), 1);
+	t->ref = notes_ref ? xstrdup(notes_ref) : NULL;
+	t->initialized = 1;
+
 	if (flags & NOTES_INIT_EMPTY || !notes_ref ||
 	    read_ref(notes_ref, object_sha1) ||
 	    get_tree_entry(object_sha1, "", sha1, &mode))
@@ -458,44 +461,56 @@ void init_notes(const char *notes_ref, int ...
From: Johan Herland
Date: Friday, October 9, 2009 - 3:22 am

Also verify that multiple references to the _same_ note blob are _not_
concatenated.

Signed-off-by: Johan Herland <johan@herland.net>
---
 t/t3303-notes-subtrees.sh |   84 +++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 84 insertions(+), 0 deletions(-)

diff --git a/t/t3303-notes-subtrees.sh b/t/t3303-notes-subtrees.sh
index cbb9d35..edc4bc8 100755
--- a/t/t3303-notes-subtrees.sh
+++ b/t/t3303-notes-subtrees.sh
@@ -101,4 +101,88 @@ test_expect_success 'verify notes in 4/36-fanout' 'verify_notes'
 test_expect_success 'test notes in 2/2/36-fanout' 'test_sha1_based "s|^\(..\)\(..\)|\1/\2/|"'
 test_expect_success 'verify notes in 2/2/36-fanout' 'verify_notes'
 
+test_same_notes () {
+	(
+		start_note_commit &&
+		nr=$number_of_commits &&
+		git rev-list refs/heads/master |
+		while read sha1; do
+			first_note_path=$(echo "$sha1" | sed "$1")
+			second_note_path=$(echo "$sha1" | sed "$2")
+			cat <<INPUT_END &&
+M 100644 inline $second_note_path
+data <<EOF
+note for commit #$nr
+EOF
+
+M 100644 inline $first_note_path
+data <<EOF
+note for commit #$nr
+EOF
+
+INPUT_END
+
+			nr=$(($nr-1))
+		done
+	) |
+	git fast-import --quiet
+}
+
+test_expect_success 'test same notes in 4/36-fanout and 2/38-fanout' 'test_same_notes "s|^..|&/|" "s|^....|&/|"'
+test_expect_success 'verify same notes in 4/36-fanout and 2/38-fanout' 'verify_notes'
+
+test_expect_success 'test same notes in 2/38-fanout and 2/2/36-fanout' 'test_same_notes "s|^\(..\)\(..\)|\1/\2/|" "s|^..|&/|"'
+test_expect_success 'verify same notes in 2/38-fanout and 2/2/36-fanout' 'verify_notes'
+
+test_expect_success 'test same notes in 4/36-fanout and 2/2/36-fanout' 'test_same_notes "s|^\(..\)\(..\)|\1/\2/|" "s|^....|&/|"'
+test_expect_success 'verify same notes in 4/36-fanout and 2/2/36-fanout' 'verify_notes'
+
+test_concatenated_notes () {
+	(
+		start_note_commit &&
+		nr=$number_of_commits &&
+		git rev-list refs/heads/master |
+		while read sha1; do
+			first_note_path=$(echo ...
From: Johan Herland
Date: Friday, October 9, 2009 - 3:22 am

This includes a first attempt at creating an optimal fanout scheme (which
is created on-the-fly, while traversing).

Signed-off-by: Johan Herland <johan@herland.net>
---
 notes.c |  101 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 notes.h |    9 +++++
 2 files changed, 110 insertions(+), 0 deletions(-)

diff --git a/notes.c b/notes.c
index 2196a5f..9581b98 100644
--- a/notes.c
+++ b/notes.c
@@ -339,6 +339,101 @@ static void load_subtree(struct leaf_node *subtree, struct int_node *node,
 	free(buf);
 }
 
+/*
+ * Determine optimal on-disk fanout for this part of the notes tree
+ *
+ * Given a (sub)tree and the level in the internal tree structure, determine
+ * whether or not the given existing fanout should be expanded for this
+ * (sub)tree.
+ *
+ * Values of the 'fanout' variable:
+ * - 0: No fanout (all notes are stored directly in the root notes tree)
+ * - 1: 2/38 fanout
+ * - 2: 2/2/36 fanout
+ * - 3: 2/2/2/34 fanout
+ * etc.
+ */
+static unsigned char determine_fanout(struct int_node *tree, unsigned char n,
+		unsigned char fanout)
+{
+	/*
+	 * The following is a simple heuristic that works well in practice:
+	 * For each even-numbered 16-tree level (remember that each on-disk
+	 * fanout level corresponds to two 16-tree levels), peek at all 16
+	 * entries at that tree level. If any of them are subtree entries, then
+	 * there are likely plenty of notes below this level, so we return an
+	 * incremented fanout immediately. Otherwise, we return an incremented
+	 * fanout only if all of the entries at this level are int_nodes.
+	 */
+	unsigned int i;
+	if ((n % 2) || (n > 2 * fanout))
+		return fanout;
+	for (i = 0; i < 16; i++) {
+		switch(GET_PTR_TYPE(tree->a[i])) {
+		case PTR_TYPE_SUBTREE:
+			return fanout + 1;
+		case PTR_TYPE_INTERNAL:
+			continue;
+		default:
+			return fanout;
+		}
+	}
+	return fanout + 1;
+}
+
+static void construct_path_with_fanout(const unsigned char *sha1,
+		unsigned char fanout, char ...
From: Johan Herland
Date: Friday, October 9, 2009 - 3:22 am

The semantics used when parsing notes trees (with regards to fanout subtrees)
follow Dscho's proposal fairly closely:
- No concatenation/merging of notes is performed. If there are several notes
  objects referencing a given commit, only one of those objects are used.
- If a notes object for a given commit is present in the "root" notes tree,
  no subtrees are consulted; the object in the root tree is used directly.
- If there are more than one subtree that prefix-matches the given commit,
  only the subtree with the longest matching prefix is consulted. This
  means that if the given commit is e.g. "deadbeef", and the notes tree have
  subtrees "de" and "dead", then the following paths in the notes tree are
  searched: "deadbeef", "dead/beef". Note that "de/adbeef" is NOT searched.
- Fanout directories (subtrees) must references a whole number of bytes
  from the SHA1 sum they subdivide. E.g. subtrees "dead" and "de" are
  acceptable; "d" and "dea" are not.
- Multiple levels of fanout are allowed. All the above rules apply
  recursively. E.g. "de/adbeef" is preferred over "de/adbe/ef", etc.

This patch changes the in-memory datastructure for holding parsed notes:
Instead of holding all note (and subtree) entries in a hash table, a
simple 16-tree structure is used instead. The tree structure consists of
16-arrays as internal nodes, and note/subtree entries as leaf nodes. The
tree is traversed by indexing subsequent nibbles of the search key until
a leaf node is encountered. If a subtree entry is encountered while
searching for a note, the subtree is unpacked into the 16-tree structure,
and the search continues into that subtree.

The new algorithm performs significantly better in the cases where only
a fraction of the notes need to be looked up (this is assumed to be the
common case for notes lookup). The new code even performs marginally
better in the worst case (where _all_ the notes are looked up).

In addition to this, comes the massive performance win associated ...
Previous thread: [RFC/PATCHv7 06/22] fast-import: Add support for importing commit notes by Johan Herland on Friday, October 9, 2009 - 3:22 am. (1 message)

Next thread: [PATCH] git-svn: hide find_parent_branch output in double quiet mode by Simon Arlott on Friday, October 9, 2009 - 5:21 am. (3 messages)