Re: Empty directories...

Previous thread: Character set for the HTML documentation by H. Peter Anvin on Tuesday, July 17, 2007 - 4:31 pm. (3 messages)

Next thread: git svn dcommit seg fault by Perrin Meyer on Tuesday, July 17, 2007 - 7:51 pm. (3 messages)
From: David Kastrup
Date: Tuesday, July 17, 2007 - 5:13 pm

GIT(7) -- 03/05/2007

NAME
	git - the stupid content tracker


Well, I use git for tracking contents.  That means, for example,
installation trees for some application.  Let's take a typical TeXlive
tree as an example.  Those trees contain, among other things,
directories where new fonts/formats/whatever get placed as things run.
Quite a few of them start out empty, but their permissions have to
correspond to their purpose (for example, some are world-writable).

I see little chance to get this achieved without doing something like

find -type d -empty -execdir touch {}/.git-this-is-empty +

before every checkin and

find -name .git-this-is-empty -exec rm -- {} +

after every checkout.  Which is pretty stupid.

As some anecdotal stuff, I did something like

mkdir test
cd test
git-init
touch README
git-add README # another peeve: why is no empty reference point possible?
git-commit -a -m "Initial branch"
git checkout -b newbranch master
unzip ../somearchive -d subdir
git add subdir
git commit -a -m "Add subdir"
git checkout -b newbranch2 master

and expect to have a clean slate.  No such luck: without warning, all
empty directories in the zip file are still remaining within subdir,
which as a consequence has not been cleaned up.

So even if one is of the opinion that empty directories are not worth
putting into the repository: if I check in an entire subdirectory
hierarchy and then switch to a branch where this subdirectory is not
existent, I expect the subdirectory to be _gone_, and not have some
littering of empty directories lying around.

And that git-diff can see nothing wrong with that does not really
improve things.

So if git is supposed to be a content tracker, I can't see a way
around it actually being able to track content, and empty directories
_are_ content.  It can't let them flying around with arbitrary
permissions on them when I switch branches or tags.  And the
workaround using "touch" mentioned above is really awful to do
manually all ...
From: Johannes Schindelin
Date: Tuesday, July 17, 2007 - 5:35 pm

Hi,


If you had the idea already, I wonder why you did not find it.  It's not 
really anything like hard to find:

http://git.or.cz/gitwiki/GitFaq#head-1fbd4a018d45259c197b169e87dafce2a3c6b5f9

Ciao,
Dscho

-

From: David Kastrup
Date: Tuesday, July 17, 2007 - 11:07 pm

The FAQ answer is weazeling on several accounts:

a) No, git only cares about files, or rather git tracks content and
   empty directories have no content.

In the same manner as empty regular files have no contents, and git
tracks those.  Existence and permissions are important.

b) The problem is not just that empty directories don't get added into
the repository.  They also don't get removed again when switching to a
different checkout.  When git-diff returns zero, I expect a subsequent
checkout to not leave complete empty hierarchies around because git
can't delete any empty leaves which it chose not to track.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum
-

From: Johannes Schindelin
Date: Wednesday, July 18, 2007 - 3:26 am

Hi,


We do not track permissions of directories at all.  This is because Git is 
primarily meant to track source code, and most "permissions" (i.e. 

I _like_ the behaviour that Git does not remove a directory it added, when 
I put some untracked file into it.  And switching back to that branch, Git 
has no problems, because it sees that the directory is already there.  In 
case of a file, it would complain, and rightfully so.

See the fundamental difference between a file and a directory now?  I 
think it boils down to "an empty directory has _no_ contents, but an empty 
file has an _empty_ content".

Ciao,
Dscho

-

From: Linus Torvalds
Date: Wednesday, July 18, 2007 - 9:23 am

Yes, but directories really are different.

First off, git wouldn't track the permissions anyway (git tracks execute 
bits, but for directories that _has_ to be set or git couldn't use them 
itself, so that's not going to happen).

Second, and much more important, the directories will exist or not 

Bzzt. Wrong.

We *do* remove directories when all files under them go away.

HOWEVER (and this is where one of the reasons for not tracking them comes 
in):

   ** YOU CANNOT REMOVE A DIRECTORY IF IT HAS SOME UNTRACKED CONTENTS **

Think about that for five seconds, then think about it some more. Ponder 
it.

So the fact is, git *already* does ass good of a job as it could possibly 
do wrt directories that go away: it tries to remove them if all the files 
that are tracked in it have gone away.

But that leaves a very common case, namely switching to another branch 
without those files, and the directory still having stale object files etc 
build crud in it.

A SCM *must*not* just remove that directory. It would be horrible. The 
fact that it has untracked files in it does not make those untracked files 
"unimportant". Maybe you feel that way about object files, but what about 
tracking some important parts of your home directory - does the fact that 
you don't necessarily track *all* of it mean that the rest is totally 
unimportant adn that git should just remove it? HELL NO!

So directories really _are_ problematic. You cannot (and should not) track 
them the same way as you track a file.

And the difference is very fundamental indeed: when you track a regular 
file, you track *all* of its content. But when you track a directory, 
you don't track it's content *at*all*.

Think about that, and then think about the fact that git is defined as a 
"content tracker", and it's not "weasely" at all to say that you don't 
track directories.

So your argument is totally bogus. When you track an empty file, you very 
much track the *content* of that file, and "empty" ...
From: Linus Torvalds
Date: Wednesday, July 18, 2007 - 9:33 am

Btw, don't get me wrong: I think that in order to be better at tracking 
other SCM's idiotic choices, we could (and I foresee that we eventually 
have to) try to track empty directories as a special case too.

So I'm not _against_ the notion of tracking empty directories, and I would 
welcome patches that do so. As I mentioned in some earlier thread when 
this came up a few weeks ago, I actually suspect that the "subproject" 
support probably ended up making it easier, because in many ways an "empty 
directory" is very close to a "anonymous subproject" from a low-level 
plumbing standpoint (even if it is *not* so from a high-level standpoint).

So I suspect that adding support for empty directories ends up being about 
just slightly extending the places that now have subproject support to 
know about a new situation.

But I do want to point out that "tracking a directory" is not at all the 
same thing as "tracking a file", no matter how much you try to argue 
otherwise. The semantics are totally different, and it all boils down to 
the fact that when you track a file, you are always talking about the 
*full* content of the file, while tracking a directory is always about 
tracking just a *subset* of the contents of the directory.

Of course, with directories, there's the trivial case where the subset 
happens to be everything, but that is neither the common nor the 
interesting case. All the interesting and complex cases happen exactly 
when the directory has untracked files in it, and at that point 

 - you really aren't tracking "contents" any more
 - you can no longer recreate the directory from the data you have (so you 
   cannot remove it on branch switches etc)
 - ergo: you're not a content tracker any more, you're a "container" 
   tracker.

And really, the "nontracked files in a directory" is the *default* thing, 
not some really unusual thing that we could disallow.

But I'm not against adding support for "container tracking". I just want 
people to ...
From: David Kastrup
Date: Wednesday, July 18, 2007 - 10:38 am

Since I did not try to argue this, could you beat another strawman?
I have seen this prepackaged rant already, but it does not really
address the problem I have been experiencing.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum
-

From: Linus Torvalds
Date: Wednesday, July 18, 2007 - 11:05 am

How about a bit of honesty?

Here's the quote:

 "The FAQ answer is weazeling on several accounts:

  a) No, git only cares about files, or rather git tracks content and
     empty directories have no content.

  In the same manner as empty regular files have no contents, and git
  tracks those.  Existence and permissions are important."

You called it "weaselly" to say that git tracks only content, and then 
very much tried to equate "existence and permissions" with content.

That's the part I answered.

So it wasn't a strawman, it was a direct answer to your assertion. Now go 
away and either come back with the patch to implement it (that I have 
encouraged you to do), or add a ".gitignore" file to the directory (that 
others have told you will solve your problems).

Don't bother talking crap.

			Linus

-

From: Matthieu Moy
Date: Wednesday, July 18, 2007 - 9:39 am

I believe David's point was different.

If you checkout a branch, create an empty directory in this branch
(probably a placeholder, either for future versionned files, or for
generated files), you cannot tell git "this empty directory is in this
branch, but not in other ones" without adding a file in it.

So, doing "git-checkout anotherbranch", this empty directory doesn't
go away. It's just unversionned in both branches, git won't touch it.

-- 
Matthieu
-

From: Linus Torvalds
Date: Wednesday, July 18, 2007 - 10:06 am

Right. Which is the suggested setup: add an empty ".gitignore" file to the 
directory, and you're done. It now acts "as if" git tracked the directory 
(git will remove the directory when switching branches), but without the 
lie that we really track any directory contents.

			Linus
-

From: David Kastrup
Date: Wednesday, July 18, 2007 - 2:37 pm

That implies that every directory in a versioned tree will exclusively
be created under manual and conscious control.  Not by running some
installer or script, unpacking some archive and so on.  But if every
content on a disk was created and put there under manual control of
the disk owner, we could still get along with floppy disks quite fine.
In practice, much more content gets sent around and juggled than what
is under immediate supervision of the user.

This is getting silly: you don't need to pull out rabbits out of your
head.  You said that you are not inclined to do any work in that area
since it does not touch _your_ use cases (well, at least not to a
degree that you consider worth bothering about) but that is no reason
to get into ridiculous arguments about other usage.  No code will come
of that.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum
-

From: Linus Torvalds
Date: Wednesday, July 18, 2007 - 2:45 pm

How hard is it for you to admit that I also said "please send in a patch".

I don't need it. You do. You do the work. I'm just explaining why the work 
hasn't been done.

		Linus
-

From: David Kastrup
Date: Wednesday, July 18, 2007 - 4:13 pm

Yup, that was one sentence in about 5 pages of bile.  In contrast,
Junio gave a good overview of the technical areas involved here, and
estimates about what to do there best.

That's a constructive way to encite somebody to delve into the task
and try to see whether he can come up with something.

But 5 pages of what amounts to "you are an idiot, come up with a

No, you are _defending_ why the work has not been done.  This
rationalizing around the bush is a waste of time.  You probably have
spent quite more time with your venting than Junio did with his
technical analysis, and the latter has been much more helpful.

So why waste all that time and adrenaline on something where you have
already said all you consider relevant?  The arguments don't get any
stronger by shouting, and it is not like you are inconvenienced in any
manner if somebody takes a look at the matter.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum
-

From: Linus Torvalds
Date: Wednesday, July 18, 2007 - 4:16 pm

Gaah.

I'm a damn softie (and soft in the head too, for writing the code).

Ok, here's a trivial patch to start the ball rolling. I'm really not 
interested in taking this patch any further personally, but I'm hoping 
that maybe it can make somebody else who is actually _interested_ in 
trackign empty directories (hint hint) decide that it's a good enough 
start that they can fill in the details.

This really updates three different areas, which are nicely separated into 
three different files, so while it's one single patch, you can actually 
follow along the changes by just looking at the differences in each file, 
which directly translate to separate conceptual changes:

 - builtin-update-index.c

   This simply contains the changes to update the index file. As usual, 
   there are multiple different cases, and they boil down to:

	(a) No index entry existed at all previously. If so, a directory 
	    will first go through the "index_path()" logic, which tries to 
	    create a GITLINK entry for it, if the subdirectory is a git 
	    directory. However, the new thing is that if that fails, it 
	    will instead just create a fake empty tree entry for it, and 
	    set the index mode to S_IFDIR.

	(b) It was a gitlink entry before. It stays as a gitlink entry, 
	    even if it cannot be indexed, and a file/symlink entry in 
	    the working tree is a conflict error.

	(c) It was a empty directory entry before. A directory stays as an 
	    empty directory entry, and a file/symlink entry in the working 
	    tree is a conflict error.

   Somebody should check that we properly delete the directory entry if we 
   add a file under it, I honestly didn't bother to go through all the 
   logic. I *think* we do it correctly just thanks to all the previous 
   code for gitlinks. Whatever.

   What I'm trying to say is that the changes are fairly straightforward, 
   but if somebody decides to push this, they need to think about it a lot 
   more than I'm ready to right ...
From: David Kastrup
Date: Wednesday, July 18, 2007 - 4:42 pm

Well, kudos.  Together with the analysis from Junio, this seems like a
good start.  Would you have any recommendations about what stuff one
should really read in order to get up to scratch about git internals?

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum
-

From: Linus Torvalds
Date: Wednesday, July 18, 2007 - 5:22 pm

Well, you do need to understand the index. That's where all the new 
subtlety happens.

The data structures themselves are trivial, and we've supported empty 
trees (at the top level) from the beginning, so that part is not anything 
new.

However, now having a new entry type in the index (S_IFDIR) means that 
anything that interacts with the index needs to think twice. But a lot of 
that is just testing what happens, and so the first thing to do is to have 
a test-suite.

There's also the question about how to show an empty tree in a diff. We've 
never had that: the only time we had empty trees was when we compared a 
totally empty "root" tree against another tree, and then it was obvious. 
But what if the empty tree is a subdirectory of another tree - how do you 
express that in a diff? Do you care? Right now, since we always recurse 
into the tree (and then not find anything), empty trees will simply not 
show up _at_all_ in any diffs.

And what about usability issues elsewhere? With my patch, doing something 
like a

	git add directory/

still won't do anything, because the behaviour of "git add" has always 
been to recurse into directories. So to add a new empty directory, you'd 
have to do

	git update-index --add directory

and that's not exactly user-friendly.

So do you add a "-n" flag to "git add" to tell it to not recurse? Or do 
you always recurse, but then if you notice that the end result is empty, 
you add it as a directory?

All the logic for that whole directory lookup is in git/dir.c, and that 
code takes various flags because different programs want different things 
(show "ignored" files, or ignore them? Show empty directories or ignore 
them? etc).

So primarily, I think the job is:

 - thinking about the index, and the interactions when adding a directory 
   or adding files under a directory that already exists.

   I *think* we get all the corner cases right, because they should be 
   exactly the same as with subprojects, but hey, ...
From: Junio C Hamano
Date: Wednesday, July 18, 2007 - 10:28 pm

Another issue I thought about was what you would do in the step
3 in the following:

 1. David says "mkdir D; git add D"; you add S_IFDIR entry in
    the index at D;

 2. David says "date >D/F; git add D/F"; presumably you drop D
    from the index (to keep the index more backward compatible)
    and add S_IFREG entry at D/F.

 3. David says "git rm D/F".

Have we stopped keeping track of the "empty directory" at this
point?

-

From: Shawn O. Pearce
Date: Wednesday, July 18, 2007 - 10:38 pm

Sadly yes.  But I don't think that's what the folks who want to
track empty directories want to have happen here.

Which is why I'm thinking we just need to track the directory, as a
node in the index, even if there are files in it, and even if we got
that directory and its contained files there by just unpacking trees.

-- 
Shawn.
-

From: Shawn O. Pearce
Date: Wednesday, July 18, 2007 - 11:09 pm

I take this back.  I really don't want that behavior.

If I do:

  mkdir -p foo/bar
  echo hello >foo/bar/world
  git add foo
  git -f rm foo/bar/world

I never asked for foo/bar or foo to stay.  In fact I want them
to disappear from Git entirely, as foo/bar is now empty and has
no content.


But we also cannot do a special --mkdir option for update-index
either, because how do we know that the user designated subtree is
a directory we must always keep in the index?

So I think the only way this works is to have a new mode that we use
in tree (04755 ?) that tells us not only is this thing a subtree,
but also that the user wants it to stay here, even if it is empty.
Those trees are always in the index as a real tree entry, even if
there are files contained in it.

And as far as getting that directory entry created/removed from
the index, well, I think a special flag to update-index would be
in order, much like --chmod=[+-]x.

Just my $0.0002 USD, which really ain't worth much at all.

-- 
Shawn.
-

From: Matthieu Moy
Date: Thursday, July 19, 2007 - 1:13 am

Well, outside git, if you do

$ mkdir -p foo/bar
$ echo hello > foo/bar/world
$ rm -f foo/bar/world

You didn't ask foo/bar to stay either, and still, it's quite natural
to have it stay in your filesystem. So, the same way you'd have ran
"rm -r foo", it seems reasonable to me to ask for "git-rm -r foo" if
the user wants to get rid of foo/ itself.

-- 
Matthieu
-

From: Tomash Brechko
Date: Thursday, July 19, 2007 - 3:51 am

Dear Git fellows,

A year or so ago I too would strongly advocate the need of tracking
empty directories, permissions et al., it seemed so "natural" and
"plain obvious" to me back then.  But since that time I learned to
appreciate the "contents tracking" approach, and now view directories
(paths in general) only as the means for Git to know where to put the
contents on checkout.  This, BTW, is consistent with how Git figures
container copies/renames.

No doubt mighty Git developers can add support for empty directories,
manage to stay backward compatible, think out consistent user
interface etc.  But there's no end to how much information one may
want to store in Git to make it "_file system_ contents tracking
software".  Starting with empty directories, one may argue then that
certain installation trees also need particular file ownership, so
lets store user/group names like tar does.  It was mentioned already
in this thread that in addition to 'rwx' we also would have to store
ACLs (some OSes have only one of these concepts, some both), SELinux
security contexts, perhaps other arbitrary file attributes that may be
part of file system state.

Wouldn't it be better to preserve Git as a contents tracking system,
and add some tools on top of it that can translate file system state
into textual (or binary) form, so it can be stored in current Git?
And then use this textual representation to restore actual file system
attributes/layout on checkout?  And the only change in Git itself
would be some more hooks, for instance one hook before checking out
over the old work tree, and one after the checkout.  Or one can simply
wrap certain Git commands to implement such hooks.

In any case, no one is going to be against the new feature if it won't
break anything for those of us who find the pure contents tracking the
right thing.  And storing empty directories by default may not be
natural for everyone.  So before going into technical details of how
this can possibly be implemented, ...
From: Johannes Schindelin
Date: Thursday, July 19, 2007 - 5:16 am

Hi,


Thank you.  It is my impression, too, that after a while it becomes 
obvious what is good and what is not.

FWIW I just whipped up a proof-of-concept patch (so at least _I_ cannot be 
accused of chickening out of writing code):

This adds the command line option "--add-empty-dirs" to "git add", which 
does the only sane thing: putting a placeholder into that directory, and 
adding that.  Since ".gitignore" is already a reserved file name in git, 
it is used as the name of this place holder.

---

	It is probably not fool-proof yet, needs documentation and a test 
	case.  But I am really sick and tired of this discussion.

 builtin-add.c |   25 +++++++++++++++++++++----
 dir.c         |   16 +++++++++++++++-
 dir.h         |    3 ++-
 3 files changed, 38 insertions(+), 6 deletions(-)

diff --git a/builtin-add.c b/builtin-add.c
index 7345479..1294840 100644
--- a/builtin-add.c
+++ b/builtin-add.c
@@ -47,7 +47,7 @@ static void prune_directory(struct dir_struct *dir, const char **pathspec, int p
 }
 
 static void fill_directory(struct dir_struct *dir, const char **pathspec,
-		int ignored_too)
+		int ignored_too, int substitute_empty_dirs)
 {
 	const char *path, *base;
 	int baselen;
@@ -63,6 +63,7 @@ static void fill_directory(struct dir_struct *dir, const char **pathspec,
 		if (!access(excludes_file, R_OK))
 			add_excludes_from_file(dir, excludes_file);
 	}
+	dir->substitute_empty_directories = substitute_empty_dirs;
 
 	/*
 	 * Calculate common prefix for the pathspec, and
@@ -143,7 +144,8 @@ static const char ignore_warning[] =
 int cmd_add(int argc, const char **argv, const char *prefix)
 {
 	int i, newfd;
-	int verbose = 0, show_only = 0, ignored_too = 0;
+	int verbose = 0, show_only = 0, ignored_too = 0,
+		substitute_empty_dirs = 0;
 	const char **pathspec;
 	struct dir_struct dir;
 	int add_interactive = 0;
@@ -191,6 +193,10 @@ int cmd_add(int argc, const char **argv, const char *prefix)
 			take_worktree_changes = 1;
 			continue;
 ...
From: Linus Torvalds
Date: Wednesday, July 18, 2007 - 4:40 pm

Oh, one word of warning: that whole "pretend_sha1_file()" thing won't 
create the object itself, and when I did the limited testing that I did, I 
actually made sure had a magic zero-sized tree object in my object 
directory.

If you don't, some things will complain, because they end up getting a 
SHA1 that they cannot look up, becasue *they* didn't create that pretend 
entry.

I didn't know which way I wanted to go with that thing. I was kind of 
thinking that maybe we would just have the zero-sized OBJ_BLOB and 
OBJ_TREE objects as special magical things, and have all git programs just 
do that "pretend" at the beginning.

But that kind of thing is probably just a totally unnecessary special 
case, and instead, that "pretend_sha1_file()" should have just been a

	write_sha1_file(NULL, 0, "tree", ce->sha1);

instead.

Anyway, if there are issues with not finding an object called 
4b825dc642cb6eb9a060e54bf8d69288fbee4904, then that's the empty tree 
object, and that pretend thing was the cause.

(The git repo itself has the empty tree as an object in it, because one of 
the commits has that - probably as a result of a bug, but there you have 
it)

		Linus
-

From: David Kastrup
Date: Wednesday, July 18, 2007 - 10:34 am

But empty directories which were empty to start with don't go away
since they are not tracked.  And that means that their parents don't
go away.

Git will remove directories which _had_ git-tracked content prior to
the checkout.  But it will not register empty directories created

Linus, condescension is all very nice, but I already told you: I had a
directory hierarchy created outside of git's control (every file comes
into being first outside of git).  This hierarchy contained empty
directories.  The while hierarchy was committed into git.  git
silently skipped registering empty directories.  Then a different
version got checked out which did not contain the directory hierarchy
in question.  And git left the (unregistered) empty directories in, as
well as all their parent directories.


But I told git to track the whole directory tree recursively.  There
were no uncommitted files it complained about.  It is not reasonable
that it is afterwards unable to remove this when I checkout some other

Sure.  But that it refuses to track the files makes the total behavior
an annoyance.  I don't complain _how_ git handles not being able to
track empty directories.  I complain about it not being able to track

When I tell it to track it, it should not refuse.  Even if it is
empty.  Because if it _stayed_ empty, git can then remove it (and
possibly the parents) when I checkout something else.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum
-

From: Matthieu Moy
Date: Tuesday, July 17, 2007 - 5:39 pm

,----[ http://www.spinics.net/lists/git/msg30730.html ]
| From: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
| 
| I wouldn't personally mind if somebody taught git to just track empty
| directories too.
| 
| There is no fundamental git database reason not to allow them: it's in
| fact quite easy to create an empty tree object. The problems with
| empty directories are in the *index*, and they shouldn't be
| insurmountable.
| 
| [...]
`----

-- 
Matthieu
-

From: Junio C Hamano
Date: Tuesday, July 17, 2007 - 7:23 pm

No objections as long as a patch is cleanly made without
regression.  It's just nobody agreed that it is "quite serious"
yet so far, and no fundamental reason against it.

-

From: David Kastrup
Date: Tuesday, July 17, 2007 - 10:56 pm

Thanks.  It certainly is not serious for the Linux kernel source, but
seems awkward for quite a few situations.  Anyway, what is your take
on the situation I described?

That creating some directory hierarchy (happening to contain empty
directories) with some external program, adding and committing it,
then switching to a different branch (or maybe doing a git-reset
--hard) leaves a skeleton of empty directories around?

I find this almost worse than not being able to put them into the
repository: you can't get rid of them anymore either!

I'd be tempted to propose that git should remove empty subdirectories
when cleaning up a removed tree in the working directory, even though
that violates the principle to not delete anything it isn't tracking.
But since you can't get it to track the stuff in the first place...

But the real fix would be to track them.

Does some trick work possibly at checkin time, like putting an empty
file into every empty directory, adding to the index, then removing
all empty files explicitly from the index and then checking in, or is
this hopeless to work around with from the user side without affecting
the repository itself?

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum
-

From: Wincent Colaiuta
Date: Tuesday, July 17, 2007 - 11:34 pm

El 18/7/2007, a las 7:56, David Kastrup escribi
From: Junio C Hamano
Date: Tuesday, July 17, 2007 - 11:53 pm

Didn't I say I do not have an objection for somebody who wants
to track empty directories, already?  I probably would not do
that myself but I do not see a reason to forbid it, either.

The right approach to take probably would be to allow entries of
mode 040000 in the index.  Traditionally, we allowed only 100644
(blobs as regular files) and 120000 (blobs as symlinks).  We
recently added 160000 (commit from outer space, aka subproject).

And we do that for all directories, not just empty ones.  So if
you have fileA, empty/, sub/fileB tracked, your index would
probably have these four entries, immediately after read-tree
of an existing tree object:

	100644 15db6f1f27ef7a... 0	fileA
	040000 4b825dc642cb6e... 0	empty
	040000 e125e11d3b63e3... 0	sub
	100644 52054201c2a872... 0	sub/fileB

Making sure that empty/ directory exists in the working tree is
probably done in entry.c; we have been touching that area in an
unrelated thread in the past few days.

If you add sub/fileC, with "update-index" (and "add"), you
invalidate the SHA-1 object name you stored for "sub" (because
there is no point recomputing the tree object until you know you
need a subtree for "sub" part, which does not happen until the
next "write-tree"), and end up with something like:

	100644 15db6f1f27ef7a... 0	fileA
	040000 4b825dc642cb6e... 0	empty
	040000 00000000000000... 0	sub
	100644 52054201c2a872... 0	sub/fileB
	100644 705bf16c546f32... 0	sub/fileC

These "missing" SHA-1 would need to be recomputed on-demand.

We have had necessary infrastructure to do this "keeping
untouched tree object names in the index" for quite some time,
but it is not a part of the index proper (it is stored in an
extension section in the index file, to keep the index
compatible with older versions of git).

Having made it sound so easy, here are the issues I would expect
to be nontrivial (but probably not rocket surgery either).

 * unpack-trees, which is the workhorse for twoway merge (aka
   "switching branches") ...
From: Johan Herland
Date: Friday, July 20, 2007 - 1:29 am

Sorry for jumping in late...

Why do you want to add _all_ directories, and not just the ones we want to 
explicitly track (independent of whether they're empty or not).

Basically, add a "--dir" flag to git-add, git-rm and friends, to tell them 
you're acting on the directory itself (rather than its (recursive) 
contents). "git-add --dir foo" will add the "040000 123abc... 0 foo" to the 
index/tree whether or not foo is an empty directory. "git-rm --dir foo" will 
remove that entry (or fail if it doesn't exist), but _not_ the contents of 
foo.

Since we're making directory tracking _explicit_, this should all be trivially 
backward-compatible.


...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net
-

From: Robin Rosenberg
Date: Thursday, July 26, 2007 - 4:33 pm

(
	I don't know which mail is the best to reply to and I probably missed 
	something in the thread, so bear with me if I'm repeating anything.
)

David. Reconsider "tracking" all directories and what that would give, 
compared to explicitly tracking specific ones and the requires magic entries.

Say we have a config setting that tells git never to remove empty trees. Linus 
patches could be a start for representing trees in the index. As an 
optimization the index could prune trees from the index if they contain 
things as long as the index *effectively* remembers all trees.

Using the patches again we could add empty directories to the index and remove 
them. No directory would be removed automatically, except maybe by a merge.

We would probably have only a few empty directories and new unexpected ones
would only pop up when we remove all blobs from one. Git status could tell us
about them so we will not forget them. It could even tell us about "new" empty
directories, which is probably the most important thing you'd want to know. 

Forgetting to untrack an empty directory would not be a big deal.

Whether to retain empty trees or not should be a repository policy, but an all 
or nothing setting.

-- robin
-

From: David Kastrup
Date: Thursday, July 26, 2007 - 10:22 pm

It would be quite a nuisance for a patch-based workflow, since patches
don't talk about the creation and deletion of directories.

The "track only when entered approach" has the advantage that
directories that were only created to accommodate patches will be
removed again when becoming empty.



But it doesn't.  If you do git-add tree, optimizing the dir entry away
since tree/zap exists, then subsequently do git-rm tree/zap, of course
there is nothing to do except remove tree/zap, and the tree is gone.

One can't start tracking trees explicitly only when they become empty,

I currently have the problem that

rm -rf *
unzip some-archive
git-add some-archive
git-commit -a -m whatever
git-checkout something else


I don't want a source management system to tell me whenever it is

With that approach idea the workflow

"Apply a patch creating something/hello"
"Undo the patch creating something/hello"

will leave something lying around.  For somebody managing hundreds of
directories, that would be a nuisance.

I don't say that a "track all parents automatically" approach would
not have its merits: it would likely prevent some mistakes and be
easily understandable to most users.  But for managing a patch
workflow, it would appear to get in the way.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum
-

Previous thread: Character set for the HTML documentation by H. Peter Anvin on Tuesday, July 17, 2007 - 4:31 pm. (3 messages)

Next thread: git svn dcommit seg fault by Perrin Meyer on Tuesday, July 17, 2007 - 7:51 pm. (3 messages)