Didn't I say I do not have an objection for somebody who wants
to track empty directories, already? I probably would not do
that myself but I do not see a reason to forbid it, either.
The right approach to take probably would be to allow entries of
mode 040000 in the index. Traditionally, we allowed only 100644
(blobs as regular files) and 120000 (blobs as symlinks). We
recently added 160000 (commit from outer space, aka subproject).
And we do that for all directories, not just empty ones. So if
you have fileA, empty/, sub/fileB tracked, your index would
probably have these four entries, immediately after read-tree
of an existing tree object:
100644 15db6f1f27ef7a... 0 fileA
040000 4b825dc642cb6e... 0 empty
040000 e125e11d3b63e3... 0 sub
100644 52054201c2a872... 0 sub/fileB
Making sure that empty/ directory exists in the working tree is
probably done in entry.c; we have been touching that area in an
unrelated thread in the past few days.
If you add sub/fileC, with "update-index" (and "add"), you
invalidate the SHA-1 object name you stored for "sub" (because
there is no point recomputing the tree object until you know you
need a subtree for "sub" part, which does not happen until the
next "write-tree"), and end up with something like:
100644 15db6f1f27ef7a... 0 fileA
040000 4b825dc642cb6e... 0 empty
040000 00000000000000... 0 sub
100644 52054201c2a872... 0 sub/fileB
100644 705bf16c546f32... 0 sub/fileC
These "missing" SHA-1 would need to be recomputed on-demand.
We have had necessary infrastructure to do this "keeping
untouched tree object names in the index" for quite some time,
but it is not a part of the index proper (it is stored in an
extension section in the index file, to keep the index
compatible with older versions of git).
Having made it sound so easy, here are the issues I would expect
to be nontrivial (but probably not rocket surgery either).
* unpack-trees, which is the workhorse for twoway merge (aka
"switching branches") and threeway merge, has a convoluted
logic to avoid D/F conflicts; it can probably be cleaned up
once we do the above conversion so that the index starts
saying "Hey, I have a directory here" more explicitly. The
end result would probably be a code easier to follow.
* status, update-index --refresh, and diff-files cares about
the information cached in the index from the last time
lstat(2) is run on each entry. What we should store there
for "tree" entries is very unclear to me, but probably we
should teach them to ignore the stat-matching logic for
these entries.
* diff-index walks the index and a tree in parallel but does
not currently expect to see a tree object in the index. It
needs to be taught to ignore these "tree" entries.
* merge-recursive and merge-index walk the index, coming up
with the merge results one path at a time. They also need to
be taught to ignore these "tree" entries.
* diff-index and "read-tree -m" should be taught to take
advantage of the "tree" entries in the index. For example,
if diff-index finds the "tree" entry in the index and the
subtree found from the tree object exactly match, it does not
even have to descend into the tree, which would be a huge
performance win (because you do not have to open the subtree
and its subtrees from the tree side; you already have read
everything on the index side, and still have to skip the
entries in the directory). "read-tree -m" also should be
able to optimize two identical subtrees in the 2 or 3 trees
involved.
Even if we follow the "lazy invalidate" strategy to maintain
the "tree" entries in the normal codepath, we could have a
special operation that says "now update all the tree entries
by recomputing the tree object names as needed". Perhaps we
might want to initiate such an operation before "read-tree
-m" automatically.
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html