On Sun, Feb 14, 2010 at 2:06 PM, Dmitry Potapov <dpotapov@gmail.com> wrote:
Well, the numbers are rather easy to calculate of course. On a 32-bit
machine, your (ideal) maximum address space size is 4GB. On a 64-bit
machine, it's a heck of a lot bigger. And in either case, a single
process consuming it all doesn't matter since it won't hurt other
processes. But the available RAM is frequently less than 4GB and that
has to be shared between *all* your processes.
It definitely doesn't resolve all the issues. There are different
ways of looking at this; one is to not bother make git-add work
smoothly with large files, because calculating the deltas will later
cause a disastrous meltdown anyway. In fact, arguably you should
prevent git-add from adding large files at all, because at least then
you don't get the repository into a hard-to-recover-from state with
huge files. (This happened at work a few months ago; most people have
no idea what to do in such a situation.)
The other way to look at it is that if we want git to *eventually*
work with huge files, we have to fix each bug one at a time, and we
can't go making things worse.
For my own situation, I think I'm more likely to (and I know people
who are more likely to) try storing huge files in git than I am likely
to modify a file *while* I'm trying to store it in git.
I have a bit of experience splitting files into chunks:
http://groups.google.com/group/bup-list/browse_thread/thread/812031efd4c5f7e4
It works. Also note that the speed gain from mmap'ing packs appears
to be much less than the gain from mmap'ing indexes. You could
probably sacrifice most or all of the former and never really notice.
Caching expanded deltas can be pretty valuable, though. (bup
presently avoids that whole question by not using deltas.)
I can also confirm that streaming objects directly into packs is a
massive performance increase when dealing with big files. However,
you then start to run into git's heuristics that often assume (for
example) that if an object is in a pack, it should never (or rarely)
be pruned. This is normally a fine assumption, because if it was
likely to get pruned, it probably never would have been put into a
pack in the first place.
Have fun,
Avery
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html