Re: Handling large files with GIT

Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
From: Linus Torvalds
Date: Wednesday, February 8, 2006 - 9:34 am

On Wed, 8 Feb 2006, Johannes Schindelin wrote:

Indeed. The git architecture simply sucks for big objects. It was 
discussed somewhat durign the early stages, but a lot of it really is 
pretty fundamental. The fact that all the operations work on a full 
object, and the delta's are (on purpose) just a very specific and limited 
kind of size compression is just very ingrained.


It probably wouldn't help that much, really. And it would probably impact 
source code users too: I bet we'd have bugs. It would be a very strange 
special case.

It also would only help for things that purely grow at the end. Which 
isn't even true for a mailbox: it may or may not be true for your INBOX, 
but anybody who _uses_ a mailbox format to read his email will be adding 
status flags to the mbox format (or deleting mbox entries etc). 

So every time a small change happened that changed the offset, you'd have 
an explosion of these 256kB chunk objects, and while the delta would work 
(probably slowly - remember how the git deltification algorithm tries to 
compare against the ten "nearest" neighbors), at _commit_ time you'd have 
to write that 1GB (compressed) out anyway.

Realistically, I think the answer is that git just doesn't work for his 
usage case. There's two alternatives:

 - convince him to not have big mailboxes (an answer I don't particularly 
   like: it's a tool limitation, and you shouldn't change your behaviour 
   just because the tool doesn't work for it - you should just try to find 
   the right tool).

   That said: git should actually work beautifully for email if you 
   _don't_ keep it as one big mbox. You could probably very reasonably use 
   git as a database backend, where each email is its own object, and you 
   can have many different ways of indexing them into trees (by content, 
   by date, by author, by thread).

   But that's very different from the suggested "home directory" setup 
   would be.

 - try to work around some of the worst git issues. While I don't think 
   the 256kB blockign thing would help (the git protocol would still 
   always send the base versions), there _are_ probably things that could 
   be done. They'd be very invasive, though, and somebody would seriously 
   have to look at the architectural issues.

   For example, right now the decision to send only "self-contained" packs 
   in the git protocol was a very conscious one: it's much safer, and it 
   makes the unpacking a lot easier (the unpacking doesn't ever have to 
   even read any other objects than the stream it gets). It's also (for 
   packs that we use on-disk) the only sane way to avoid nasty inter-pack 
   dependencies.

   But for the git protocol, the inter-pack dependencies don't matter, 
   if we'd always unpack the thing on reception if it is not a 
   self-contained pack. So we _could_ allow delta's that depend on the 
   receiver already having the objects we delta against.

   However, the deltification itself is likely very slow, exactly because 
   git (again, very much by design) generates the deltas dynamically 
   rather than depending on things already being in delta format.

Personally, I think the answer is "git is good for lots of small files". 
It's very much what git was designed for, and the fact that it doesn't 
work for everything is a trade-off for the things it _does_ work well for.

			Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
Handling large files with GIT, Martin Langhoff, (Wed Feb 8, 2:14 am)
Re: Handling large files with GIT, Johannes Schindelin, (Wed Feb 8, 4:54 am)
Re: Handling large files with GIT, Linus Torvalds, (Wed Feb 8, 9:34 am)
Re: Handling large files with GIT, Linus Torvalds, (Wed Feb 8, 10:01 am)
Re: Handling large files with GIT, Junio C Hamano, (Wed Feb 8, 1:11 pm)
Re: Handling large files with GIT, Florian Weimer, (Wed Feb 8, 2:20 pm)
Re: Handling large files with GIT, Martin Langhoff, (Wed Feb 8, 3:35 pm)
Re: Handling large files with GIT, Greg KH, (Wed Feb 8, 9:54 pm)
Re: Handling large files with GIT, Martin Langhoff, (Wed Feb 8, 10:38 pm)
Re: Handling large files with GIT, Ben Clifford, (Sun Feb 12, 6:26 pm)
Re: Handling large files with GIT, Linus Torvalds, (Sun Feb 12, 8:42 pm)
Re: Handling large files with GIT, Martin Langhoff, (Sun Feb 12, 9:40 pm)
Re: Handling large files with GIT, Linus Torvalds, (Sun Feb 12, 9:57 pm)
Re: Handling large files with GIT, Linus Torvalds, (Sun Feb 12, 10:05 pm)
Re: Handling large files with GIT, Jeff Garzik, (Sun Feb 12, 10:55 pm)
Re: Handling large files with GIT, Keith Packard, (Sun Feb 12, 11:07 pm)
Re: Handling large files with GIT, Linus Torvalds, (Mon Feb 13, 9:19 am)
Re: Handling large files with GIT, Ian Molton, (Mon Feb 13, 4:17 pm)
Re: Handling large files with GIT, Martin Langhoff, (Mon Feb 13, 4:19 pm)
Re: Handling large files with GIT, Martin Langhoff, (Mon Feb 13, 5:07 pm)
Re: Handling large files with GIT, Johannes Schindelin, (Tue Feb 14, 11:56 am)
Re: Handling large files with GIT, Linus Torvalds, (Tue Feb 14, 12:52 pm)
Re: Handling large files with GIT, Sam Vilain, (Tue Feb 14, 2:21 pm)
Re: Handling large files with GIT, Linus Torvalds, (Tue Feb 14, 3:01 pm)
Re: Handling large files with GIT, Junio C Hamano, (Tue Feb 14, 3:30 pm)
Re: Handling large files with GIT, Sam Vilain, (Tue Feb 14, 5:40 pm)
Re: Handling large files with GIT, Junio C Hamano, (Tue Feb 14, 6:39 pm)
Re: Handling large files with GIT, Linus Torvalds, (Tue Feb 14, 7:05 pm)
Re: Handling large files with GIT, Martin Langhoff, (Tue Feb 14, 7:07 pm)
Re: Handling large files with GIT, Linus Torvalds, (Tue Feb 14, 7:18 pm)
Re: Handling large files with GIT, Linus Torvalds, (Tue Feb 14, 7:33 pm)
Re: Handling large files with GIT, Linus Torvalds, (Tue Feb 14, 8:58 pm)
Re: Handling large files with GIT, Sam Vilain, (Tue Feb 14, 9:03 pm)
Re: Handling large files with GIT, Junio C Hamano, (Wed Feb 15, 2:54 am)
Re: Handling large files with GIT, Linus Torvalds, (Wed Feb 15, 8:44 am)
Re: Handling large files with GIT, Linus Torvalds, (Wed Feb 15, 10:16 am)
Re: Handling large files with GIT, Linus Torvalds, (Wed Feb 15, 8:25 pm)
Re: Handling large files with GIT, Junio C Hamano, (Wed Feb 15, 8:29 pm)
Re: Handling large files with GIT, Fredrik Kuivinen, (Thu Feb 16, 1:32 pm)