Re: git annotate runs out of memory

Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
From: Linus Torvalds
Date: Tuesday, December 11, 2007 - 5:02 pm

On Tue, 11 Dec 2007, Linus Torvalds wrote:

Ok, I lied.

Nothing is beyond my skills. My mad k0der skillz are unbeatable.

This speeds up git-blame on ChangeLog-style files by a big amount, by just 
ignoring the common end that we don't care about, since we don't want any 
context anyway at that point. So I now get:

	[torvalds@woody gcc]$ time git blame gcc/ChangeLog > /dev/null

	real    0m7.031s
	user    0m6.852s
	sys     0m0.180s

which seems quite reasonable, and is about three times faster than trying 
to diff those big files.

Davide: this really _does_ make a huge difference. Maybe xdiff itself 
should do this optimization on its own, rather than have the caller hack 
around the fact that xdiff doesn't handle this common case all that well?

The same thing obviously works for the beginning-of-file too, but then you 
have to play games with line numbers being affected etc, so the end is the 
rather much easier case and is the case that a ChangeLog-style file cares 
about.

Daniel, this is obviously on top of the patches that fix the memory leak.

			Linus

---
diff --git a/builtin-blame.c b/builtin-blame.c
index c158d31..677188c 100644
--- a/builtin-blame.c
+++ b/builtin-blame.c
@@ -543,6 +551,20 @@ static struct patch *compare_buffer(mmfile_t *file_p, mmfile_t *file_o,
 	return state.ret;
 }
 
+#define BLOCK 1024
+
+static void truncate_common_data(mmfile_t *a, mmfile_t *b)
+{
+	long l1 = a->size, l2 = b->size;
+
+	while ((l1 -= BLOCK) > 0 && (l2 -= BLOCK) > 0) {
+		if (memcmp(a->ptr + l1, b->ptr + l2, BLOCK))
+			break;
+		a->size = l1;
+		b->size = l2;
+	}
+}
+
 /*
  * Run diff between two origins and grab the patch output, so that
  * we can pass blame for lines origin is currently suspected for
@@ -557,6 +579,7 @@ static struct patch *get_patch(struct origin *parent, struct origin *origin)
 	fill_origin_blob(origin, &file_o);
 	if (!file_p.ptr || !file_o.ptr)
 		return NULL;
+	truncate_common_data(&file_p, &file_o);
 	patch = compare_buffer(&file_p, &file_o, 0);
 	num_get_patch++;
 	return patch;
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
git annotate runs out of memory, Daniel Berlin, (Tue Dec 11, 10:33 am)
Re: git annotate runs out of memory, Nicolas Pitre, (Tue Dec 11, 10:47 am)
Re: git annotate runs out of memory, Daniel Berlin, (Tue Dec 11, 10:53 am)
Re: git annotate runs out of memory, Nicolas Pitre, (Tue Dec 11, 11:01 am)
Re: git annotate runs out of memory, Marco Costalba, (Tue Dec 11, 11:32 am)
Re: git annotate runs out of memory, Linus Torvalds, (Tue Dec 11, 11:40 am)
Re: git annotate runs out of memory, Matthieu Moy, (Tue Dec 11, 12:01 pm)
Re: git annotate runs out of memory, Daniel Berlin, (Tue Dec 11, 12:03 pm)
Re: git annotate runs out of memory, Nicolas Pitre, (Tue Dec 11, 12:06 pm)
Re: git annotate runs out of memory, Daniel Berlin, (Tue Dec 11, 12:09 pm)
Re: git annotate runs out of memory, Marco Costalba, (Tue Dec 11, 12:14 pm)
Re: git annotate runs out of memory, Linus Torvalds, (Tue Dec 11, 12:22 pm)
Re: git annotate runs out of memory, Daniel Berlin, (Tue Dec 11, 12:24 pm)
Re: git annotate runs out of memory, Daniel Barkalow, (Tue Dec 11, 12:26 pm)
Re: git annotate runs out of memory, Jason Sewall, (Tue Dec 11, 12:27 pm)
Re: git annotate runs out of memory, Steven Grimm, (Tue Dec 11, 12:29 pm)
Re: git annotate runs out of memory, Pierre Habouzit, (Tue Dec 11, 12:34 pm)
Re: git annotate runs out of memory, Linus Torvalds, (Tue Dec 11, 12:42 pm)
Re: git annotate runs out of memory, Pierre Habouzit, (Tue Dec 11, 12:42 pm)
Re: git annotate runs out of memory, Daniel Barkalow, (Tue Dec 11, 12:46 pm)
Re: git annotate runs out of memory, Linus Torvalds, (Tue Dec 11, 12:50 pm)
Re: git annotate runs out of memory, Junio C Hamano, (Tue Dec 11, 12:59 pm)
Re: git annotate runs out of memory, Marco Costalba, (Tue Dec 11, 1:14 pm)
Re: git annotate runs out of memory, Jakub Narebski, (Tue Dec 11, 1:14 pm)
Re: git annotate runs out of memory, Marco Costalba, (Tue Dec 11, 1:29 pm)
Re: git annotate runs out of memory, Jon Smirl, (Tue Dec 11, 1:31 pm)
Re: git annotate runs out of memory, Daniel Berlin, (Tue Dec 11, 2:09 pm)
Re: git annotate runs out of memory, Linus Torvalds, (Tue Dec 11, 2:14 pm)
Re: git annotate runs out of memory, Daniel Berlin, (Tue Dec 11, 2:14 pm)
Re: git annotate runs out of memory, Daniel Berlin, (Tue Dec 11, 2:24 pm)
Re: git annotate runs out of memory, Linus Torvalds, (Tue Dec 11, 2:34 pm)
Re: git annotate runs out of memory, Junio C Hamano, (Tue Dec 11, 2:54 pm)
Re: git annotate runs out of memory, Linus Torvalds, (Tue Dec 11, 4:36 pm)
Re: git annotate runs out of memory, Matthieu Moy, (Tue Dec 11, 4:37 pm)
Re: git annotate runs out of memory, Linus Torvalds, (Tue Dec 11, 4:48 pm)
Re: git annotate runs out of memory, Linus Torvalds, (Tue Dec 11, 5:02 pm)
Re: git annotate runs out of memory, Davide Libenzi, (Tue Dec 11, 5:22 pm)
Re: git annotate runs out of memory, Linus Torvalds, (Tue Dec 11, 5:50 pm)
Re: git annotate runs out of memory, Junio C Hamano, (Tue Dec 11, 5:56 pm)
Re: git annotate runs out of memory, Davide Libenzi, (Tue Dec 11, 6:12 pm)
Re: git annotate runs out of memory, Linus Torvalds, (Tue Dec 11, 7:10 pm)
Re: git annotate runs out of memory, Linus Torvalds, (Tue Dec 11, 7:20 pm)
Re: git annotate runs out of memory, Linus Torvalds, (Tue Dec 11, 7:39 pm)
Re: git annotate runs out of memory, Linus Torvalds, (Tue Dec 11, 8:35 pm)
Re: git annotate runs out of memory, Shawn O. Pearce, (Tue Dec 11, 8:57 pm)
Re: git annotate runs out of memory, Junio C Hamano, (Tue Dec 11, 9:48 pm)
Re: git annotate runs out of memory, Jeff King, (Wed Dec 12, 12:57 am)
Re: git annotate runs out of memory, Florian Weimer, (Wed Dec 12, 3:36 am)
Re: git annotate runs out of memory, Daniel Berlin, (Wed Dec 12, 12:43 pm)