Re: [PATCH] don't use mmap() to hash files

Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
From: Avery Pennarun
Date: Sunday, February 14, 2010 - 10:01 pm

On Sun, Feb 14, 2010 at 11:16 PM, Nicolas Pitre <nico@fluxnic.net> wrote:

That would be ideal, but is more work than disabling imports for large
files by default (for example), which would be easy.  In any case, my
solution at work was to say "if it hurts, don't do that" and it seems
to have worked out okay for now.


Well, I'm thinking of things like textual database dumps, such as
those produced by mysqldump.  It would be nice to be able to diff
those efficiently, even if they're several gigs in size.  bup's
hierarchical chunking allows this.


Note that bup's rolling-checksum-based hierarchical chunking is not
the same as the chunking that was discussed in that thread, and it
resolves most of the problems.  Unless I'm missing something.

Also note that bup just uses normal tree objects (for better or worse)
instead of introducing a new object type.


Yes, sorry to have implied otherwise.  I was just comparing the
performance advantage of the delta expansion cache (which should be a
lot) with that of mmaping packfiles (which probably isn't much since
the packfile data is typically needed in expanded form anyway).


Sorry, I didn't hunt down the code, but I ran into it while
experimenting before.  The rules are something like:

- git-prune only prunes unpacked objects

- git-repack claims to be willing to explode unreachable objects back
into loose objects with -A, but I'm not quite sure if its definition
of "unreachable" is the same as mine.  And I'm not sure rewriting a
pack with -A makes the old pack reliably unreachable according to -d.
It's possible I was just being dense.

- there seems to be no documented situation in which you can ever
delete unused objects from a pack without using repack -a or -A, which
can be amazingly slow if your packs are huge.  (Ideally you'd only
repack the particular packs that you want to shrink.)  For example, my
bup repo is currently 200 GB.

Anyway, I didn't have much luck when playing with it earlier, but
didn't investigate since I assumed it's just a workflow that nobody
much cares about.  Which I think is a reasonable position for git
developers to take anyway.

Have fun,

Avery
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
[PATCH] don't use mmap() to hash files, Dmitry Potapov, (Sat Feb 13, 6:18 pm)
Re: [PATCH] don't use mmap() to hash files, Junio C Hamano, (Sat Feb 13, 6:37 pm)
Re: mmap with MAP_PRIVATE is useless, Junio C Hamano, (Sat Feb 13, 6:53 pm)
Re: [PATCH] don't use mmap() to hash files, Johannes Schindelin, (Sat Feb 13, 6:53 pm)
Re: [PATCH] don't use mmap() to hash files, Junio C Hamano, (Sat Feb 13, 7:00 pm)
Re: mmap with MAP_PRIVATE is useless, Paolo Bonzini, (Sat Feb 13, 7:11 pm)
Re: [PATCH] don't use mmap() to hash files, Dmitry Potapov, (Sat Feb 13, 7:18 pm)
Re: [PATCH] don't use mmap() to hash files, Dmitry Potapov, (Sat Feb 13, 7:42 pm)
[PATCH v2] don't use mmap() to hash files, Dmitry Potapov, (Sat Feb 13, 8:05 pm)
Re: [PATCH] don't use mmap() to hash files, Junio C Hamano, (Sat Feb 13, 8:14 pm)
Re: [PATCH] don't use mmap() to hash files, Jakub Narebski, (Sun Feb 14, 4:07 am)
Re: [PATCH] don't use mmap() to hash files, Thomas Rast, (Sun Feb 14, 4:14 am)
Re: [PATCH] don't use mmap() to hash files, Junio C Hamano, (Sun Feb 14, 4:46 am)
Re: [PATCH] don't use mmap() to hash files, Paolo Bonzini, (Sun Feb 14, 4:55 am)
Re: [PATCH] don't use mmap() to hash files, Johannes Schindelin, (Sun Feb 14, 11:10 am)
Re: [PATCH] don't use mmap() to hash files, Dmitry Potapov, (Sun Feb 14, 12:06 pm)
Re: [PATCH] don't use mmap() to hash files, Johannes Schindelin, (Sun Feb 14, 12:22 pm)
Re: [PATCH] don't use mmap() to hash files, Johannes Schindelin, (Sun Feb 14, 12:28 pm)
Re: [PATCH] don't use mmap() to hash files, Dmitry Potapov, (Sun Feb 14, 12:55 pm)
Re: [PATCH] don't use mmap() to hash files, Dmitry Potapov, (Sun Feb 14, 12:56 pm)
Re: [PATCH] don't use mmap() to hash files, Avery Pennarun, (Sun Feb 14, 4:13 pm)
Re: [PATCH] don't use mmap() to hash files, Zygo Blaxell, (Sun Feb 14, 4:52 pm)
Re: [PATCH] don't use mmap() to hash files, Nicolas Pitre, (Sun Feb 14, 9:16 pm)
Re: [PATCH] don't use mmap() to hash files, Avery Pennarun, (Sun Feb 14, 10:01 pm)
Re: [PATCH] don't use mmap() to hash files, Nicolas Pitre, (Sun Feb 14, 10:05 pm)
Re: [PATCH] don't use mmap() to hash files, Nicolas Pitre, (Sun Feb 14, 10:48 pm)
Re: [PATCH] don't use mmap() to hash files, Paolo Bonzini, (Mon Feb 15, 12:48 am)
Re: [PATCH] don't use mmap() to hash files, Dmitry Potapov, (Mon Feb 15, 5:23 am)
Re: [PATCH] don't use mmap() to hash files, Dmitry Potapov, (Mon Feb 15, 5:25 am)
Re: [PATCH] don't use mmap() to hash files, Avery Pennarun, (Mon Feb 15, 12:19 pm)
Re: [PATCH] don't use mmap() to hash files, Nicolas Pitre, (Mon Feb 15, 12:29 pm)
16 gig, 350,000 file repository, Bill Lear, (Thu Feb 18, 1:11 pm)
Re: 16 gig, 350,000 file repository, Nicolas Pitre, (Thu Feb 18, 1:58 pm)
Re: 16 gig, 350,000 file repository, Erik Faye-Lund, (Fri Feb 19, 2:27 am)