On Tue, 10 Feb 2009, Boyd Stephen Smith Jr. wrote:
That sounds unnecessarily complicated. It also really sucks for the case
you want to optimize: small differences between trees, where you don't
need to even linearize the common parts.
Why not make it just a straight fixed 12-bit prefix, single-level trie.
Sure, if you have less than 4k objects, it's going to add an unnecessary
indirection, and close to an extra tree object for each object. But it
should scale pretty well to a fairly huge numbe of notes. IOW, if you have
less than 2^24 notes (16 million), you'll never have a tree object with
more than 4k entries.
And with each tree being ~70 bytes/object (40 bytes name, 20 bytes SHA1 +
overhead), the individual tree objects will still be a reasonable(ish)
size. And the fixed depth and prefix size means that merging is trivial
and can use the normal tree merge that avoids touching common subtrees.
The default .git/objects fan-out of just 8 bits might work too, but if
we're thinking millions of notes (which is not entirely unreasonable), it
gets ugly pretty fast. The reason it works ok for git is the repacking.
Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html