Re: way to automatically add untracked files?

Previous thread: git-diff new files (without using index) by Miles Bader on Saturday, August 4, 2007 - 8:42 pm. (6 messages)

Next thread: [PATCH] Add 'test-absolute-path' to .gitignore by Johannes Schindelin on Saturday, August 4, 2007 - 10:14 pm. (1 message)
From: Miles Bader
Date: Saturday, August 4, 2007 - 8:31 pm

One thing I often want to do is git-add all untracked files, and also
automatically git-rm all "disappeared" files (I keep my .gitignore files
well maintained, so the list of adding/missing files shown by git status
is almost always correct).  At the same time, I usually want to do "git
add -u" to git-add modified files.

One way to do this seems to be just "git add .", but I'm always slightly
nervous using it because it sits there and churns the disk for an awful
long time (whereas "git status" is instantaneous).  Is this the right
thing to do?  Is there something funny causing the churning?

Thanks,

-Miles

-- 
Saa, shall we dance?  (from a dance-class advertisement)
-

From: Shawn O. Pearce
Date: Saturday, August 4, 2007 - 8:58 pm

That's the correct way to add those new files that aren't ignored.
The problem is actually a small bug in git-add; we did not take the
obvious performance optimization of skipping files that are stat
clean in the index.  So what is happening here during `git add .`
is we are reading and hashing every single file, even if it is
already tracked and is not modified.  In short we're just working
harder than we need to during this operation.

I believe this has been fixed in git 1.5.3-rc3 or rc4.  Not sure
which one; I don't have access to a git repository right now to
look it up.

-- 
Shawn.
-

From: Junio C Hamano
Date: Saturday, August 4, 2007 - 9:13 pm

That performance fix is in rc4.


-

From: Miles Bader
Date: Saturday, August 4, 2007 - 9:00 pm

Oh, also, "git add ." doesn't seem to do the right thing with
"dissapeared" files:  If I do:

    mv foo.cc bar.cc
    git add .

then git-status will show a new  file "bar.cc", but will list "foo.cc"
as "deleted " in the "Changed but not updated" section.  Perhaps the
right thing will happen if I do "git-commit -a" (though I don't know,
I don't really want to try it), this still results in incorrect
"git-diff --cached" output (it shows bar.cc as a new file, not as a
rename of foo.cc).

Am I doing something wrong, or is this just missing functionality?

Thanks,

-Miles
-- 
Do not taunt Happy Fun Ball.
-

From: Shawn O. Pearce
Date: Saturday, August 4, 2007 - 9:13 pm

Right.  Who wants "add" to actually mean "add and delete"?
Shouldn't that be then called "git-add-and-rm"?

We recently talked about this on the mailing list and decided that
git-add shouldn't remove files that have disappeared, as doing so

"git commit -a" will remove disappeared files.  It has for quite

Try adding the -M option to "git-diff".  That will enable the rename
detection, and show the rename you are looking to see.

-- 
Shawn.
-

From: Miles Bader
Date: Saturday, August 4, 2007 - 9:22 pm

"git-add ." can just as easily be thought as meaning "add the current
state of directory ".", including additions and removals"; removals,

No, it doesn't.

The problem seems to be not because git's rename detection isn't enabled
(I have it turned on by default in my globaing settings), but rather
because git hasn't been told about the removal.

And I don't see anyway to automatically tell git "please mark for
removal all files that seem to have disappeared" -- "git-add ." doesn't do
it, and git-rm doesn't seem to have any option for doing this.

Really I want a single command that just tells git "please add to the
index _all changes that you can find_".

Thanks,

-Miles

-- 
A zen-buddhist walked into a pizza shop and
said, "Make me one with everything."
-

From: Junio C Hamano
Date: Saturday, August 4, 2007 - 9:23 pm

"git add -u"

-

From: Miles Bader
Date: Saturday, August 4, 2007 - 9:30 pm

So, to _really_ add all changes, I should give two commands:

   git add .
   git add -u

?

(I tried combining them:  "git add -u .", but that didn't seem to do
anything useful)

-Miles
-- 
o The existentialist, not having a pillow, goes everywhere with the book by
  Sullivan, _I am going to spit on your graves_.
-

From: Junio C Hamano
Date: Saturday, August 4, 2007 - 9:39 pm

You are talking about two different operations.

Adding _new_ files, unless you are just beginning a new project,
are much more rare than updating modified files that you are
already tracking; and "git add new-file..." is what people
usually use for the former.  "git add ." is almost always the
"initial import" and not used later (after all you ought to know
what files you are creating and controlling ;-)).  You get into
an illusion that that is often used, only when you have just
started.  As your project progresses, that feeling will fade
away.

And that is natural, if you think about it for 5 seconds.

"Add everything in sight, although I do not know what they are",
which is essentially what "git add ." is, makes perfect sense
for the initial import and vendor drop (after perhaps rm -fr
?*).  If you are doing your own development, with your working
tree littered with build products and temporary notes files and
whatnot, "git add ." is usually the last thing you would want to
do.

Updating modified files, "git add -u", is more like "git commit
-a" (without creating commit).  You do not add _new_ files, and
that is quite deliberate.

You _could_ argue that people should be more disciplined and
write perfect .gitignore files so that "git add ." is always
safe, but the world does not work that way.



-

From: Miles Bader
Date: Saturday, August 4, 2007 - 9:53 pm

I imagine this depends strongly on the nature of the project.

My current comments stem from using git a personal project which I've
been working on for about 2 years; maybe I'm weird, but I seem to
add/remove files fairly regularly (as far as I can tell, it's not an

Sigh.  There are all sorts of people using git, and everybody has their
own working style.  My personal style involves keeping .gitignore
up-to-date so that there's no cruft in the git-status output.

Anyway, I wouldn't be complaining except that I _keep_ running into
circumstances where I need to type "git-add NEWFILE1 NEWFILE2
NEWFILE3...; git rm OLD_FILE1..." -- which is kind of annoying after
seeing a list of _exactly_ the files I need to add/remove output just
previously by git-status.  Thus my wish to have git "do it
automatically."

"git-add -u; git-add ." seems like it should do the job though.

Thanks,

-Miles

-- 
We live, as we dream -- alone....
-

From: Junio C Hamano
Date: Saturday, August 4, 2007 - 10:04 pm

As Linus explained in another thread, "git rm" is largely
unneeded.  Just work with the filesystem in normal UNIX way, and
be done with "git add -u" or even "git commit -a" and you will
be fine.

If you are more perfect than most other people in maintaining
the .gitignore file, you do not even have to name individual
files like "git add NEWFILE1..." -- you can always safely run
"git add .".

Most of us are not as perfect as you are, as you might have
noticed that Randal pointed out this morning that we missed a
new entry from our own .gitignore ;-) I highly suspect that we
will be hated by most of our users if we changed "git add -u" to
add everything in sight for this reason, and I also suspect they
will feel that "git add-remove --all" will be code bloat for
little gain.

-

From: Miles Bader
Date: Saturday, August 4, 2007 - 10:17 pm

I agree that a change to "git-add -u" would be silly... :-)

I was just looking for a convenient way to reduce my typing on those
occasions when I do have a bunch of added/removed/renamed files;
"git-add -u; git add ." seems to do the trick (of course I always
check what git-status says first!).

Thanks,

-Miles

-- 
The automobile has not merely taken over the street, it has dissolved the
living tissue of the city.  Its appetite for space is absolutely insatiable;
moving and parked, it devours urban land, leaving the buildings as mere islands
of habitable space in a sea of dangerous and ugly traffic.
[James Marston Fitch, New York Times, 1 May 1960]
-

From: Johannes Schindelin
Date: Saturday, August 4, 2007 - 10:23 pm

Hi,


Why didn't you say so?  You can always create an alias.  Problem solved.

Ciao,
Dscho

-

From: Miles Bader
Date: Saturday, August 4, 2007 - 10:27 pm

I did, I thought... :-)

-Miles 

-- 
Somebody has to do something, and it's just incredibly pathetic that it
has to be us.  -- Jerry Garcia
-

From: Steffen Prohaska
Date: Sunday, August 5, 2007 - 4:22 am

I exactly need the functionality that Miles is describing for
the following good reason:

Mac OS X has the notion of a bundle, which is a directory that
contains related files that are fully controlled by the application
that is writing that bundle. The bundle functionality is
directly supported by the OS and most applications save their
data as bundles. For example on Mac OS X, the Openoffice format,
which packs related files in a zip file, would just be a directory
with all related files grouped together (no ZIP archive needed).

So here is what I need: I want to be able to track a directory
with all its contents. The data inside the directory are not
under my control. It's only the directory that matters for me.

Git is already quite good at that because it doesn't need to
place anything inside the opaque directory! Subversion for example
has no chance because it clutters the directory with .svn
directories, which will be removed by the next Save (an
application first creates a new temporary directory, stores
all data there, moves the old directory to a backup location,
and renames the new directory to the final destination only
if no problems occurred).

When I started with git I figured out that

    git-ls-files -z --others dir | git-update-index --add -z --stdin
    git commit -a

does the job for me. Would

    git add dir
    git add -u dir
    git commit

be equivalent, but restricted to the changes in dir?

	Steffen


-

From: Johan Herland
Date: Sunday, August 5, 2007 - 5:11 am

So different users seem to have two different (almost incompatible) 
expectations to git-add:

1. git-add adds new files into the index. git-add has _no_ business removing 
deleted files from the index.

2. git-add updates the index according to the state of the working tree. 
This includes adding new files and removing deleted files.


Both interpretations are useful and worth supporting, but git-add currently 
seems focused on #1 (and rightly so, IMHO).

Even though #2 can be achieved by using a couple of git-add commmands (or a 
longer series of more obscure plumbing-level commands), it might be worth 
considering the more user-friendly alternative of adding a dedicated 
command for supporting #2. Such a command already exists in a similar RCS:

---
$ hg addremove --help
hg addremove [OPTION]... [FILE]...

add all new files, delete all missing files

    Add all new files and remove all missing files from the repository.

    New files are ignored if they match any of the patterns in .hgignore. As
    with add, these changes take effect at the next commit.

[...]
---

Adding a git-addremove command should not be much work, and it would be a 
lot friendlier to people whose workflow is more aligned with #2 than #1.


...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net
-

From: David Kastrup
Date: Sunday, August 5, 2007 - 5:17 am

Maybe just git-add -a?

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum
-

From: Theodore Tso
Date: Sunday, August 5, 2007 - 9:11 am

Not much work at alll:


(And the performance problem with git add . is fixed in 1.5.3-rc4,
right?)

						- Ted
-

From: Steffen Prohaska
Date: Sunday, August 5, 2007 - 1:04 pm

But how can I handle the [FILE]... from above?

	Steffen

-

From: Steffen Prohaska
Date: Sunday, August 5, 2007 - 9:58 pm

Thanks.

"Starting with version 1.5.3, git supports appending the
arguments to commands prefixed with "!", too. If you need
to perform a reordering, or to use an argument twice, you
can use this trick:

[alias]
         example = !sh -c "ls $1 $0"

NOTE: the arguments start with $0, not with $1 as you are
used from shell scripts." [cited from the link above]

should do the job.

	Steffen


-

From: Johan Herland
Date: Sunday, August 5, 2007 - 12:16 pm

Nice :)

But I'm wondering whether we'd want to include it in git by default (instead 

Yes, according to Junio elsewhere in this thread.


...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net
-

From: Miles Bader
Date: Sunday, August 5, 2007 - 5:00 pm

An easier-to-type name e.g. "addrm" would be good though...

-miles

-- 
`Life is a boundless sea of bitterness'
-

From: Johannes Schindelin
Date: Sunday, August 5, 2007 - 5:16 pm

Hi,


Note that this will not work: you have to add an exclamation mark before 
"git add", because it has to execute two commands.

Note also that I do _not_ suggest using --system.  This option forces you 
to run git as root on sane systems, which I think is wrong.  Rather use 

I recommend against that, too.  All too often, I have some temporary files 
in the working tree, and I'll be dimmed if I'm the only one.  So 
"addremove" adds too much possibility for pilot errors.

My opinion here is not set in stone, though.  Maybe you can convince me.

Ciao,
Dscho

-

From: Miles Bader
Date: Sunday, August 5, 2007 - 8:09 pm

"Recommend against it"?  Why?

It's a separate command, so if it doesn't fit your working style, don't
use it.  I think it _is_ a well-defined and useful action ("snapshot the
working dir") that people would sometimes like to perform, and having a
simple git command to do it would be good.

Morever, as an almost trivial alias, "code bloat" is hardly an argument
against it!

[But please, call it "addrm" -- "addremove" is just gratuitously long...] 

-Miles

-- 
o The existentialist, not having a pillow, goes everywhere with the book by
  Sullivan, _I am going to spit on your graves_.
-

From: Johannes Schindelin
Date: Sunday, August 5, 2007 - 8:21 pm

Hi,

[please, netiquette says that you should Cc _at least_ the one you're 
responding to]



Hah!  If that were true, we'd have a lot less mails like "I tried this and 
it did not work", only to find out that the person assumed that 
documentation is for wimps, and tried a command that "sounded" like it 
would do the right thing.

Ciao,
Dscho

-

From: Miles Bader
Date: Sunday, August 5, 2007 - 8:45 pm

Huh?  How is it any worse than the underlying commands it uses
("git add ." in particular)?!  Indeed, it seems rather less likely to

Git is not exactly a user-coddling, ultra-hand-holding application, nor
does it seem to have that as a goal.  It offers _tons_ of rope to hang
yourself if you wish (though it usually offers lots of ways to recover).

Rather git seems to have as a goal being a useful toolkit for managing
source trees, and based on what I've seen, tries to accomodate many
different styles of usage (rather than trying to force a certain style
down the users' throats -- as some VCSs try to do ...).

-miles

-- 
/\ /\
(^.^)
(")")
*This is the cute kitty virus, please copy this into your sig so it can spread.
-

From: Johan Herland
Date: Monday, August 6, 2007 - 12:46 am

Ok, in that spirit I also suggest removing _all_ git plumbing-level commands
from the default installation. I also suggest adding confirmation dialog to
any command that alters the repo, since we have to protect the user against
"pilot errors".

Get real. Adding a separate command (provided it's well implemented and
documented) does not push the user off a cliff. Just because the command
doesn't fit your workflow doesn't mean it's dangerous and should never be
included. Just don't use it.

If git were only to support the (probably non-existing) intersection of its
user's workflows, we would probably have to pull e.g. git-rebase out of the
tree, because (according to some) rewriting history is evil, and extremely

Having commands that "sound" like they do the right thing is not a bad idea
at all. We should have more of those.


...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net
-

From: Johannes Schindelin
Date: Monday, August 6, 2007 - 5:17 am

Hi,


Oh, come on.  You were talking about a porcelain command.

Ciao,
Dscho

-

From: Junio C Hamano
Date: Monday, August 6, 2007 - 1:45 am

It obviously is not the time to do this as I have already said
that I won't look at anything but fixes and documentation
updates until 1.5.3, but I am not opposed to have "git add -a $paths"
which would do something like "git add $paths && git add -u $paths".
We also might want to add "git add --refresh $paths" while at
it, which were brought up recently in a separate thread.



-

From: David Kastrup
Date: Monday, August 6, 2007 - 11:19 am

I'm all for it.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum
-

From: Miles Bader
Date: Monday, August 6, 2007 - 5:08 pm

Me too... :-)

[I think it's good that it be part of the "add" command (instead of a
separate command/alias), because a new user stands a better chance of
finding it in the documentation... when I was trying to figure out how
to do this by myself, I certainly started by reading the man page for
git-add!]

Thanks,

-Miles

-- 
((lambda (x) (list x x)) (lambda (x) (list x x)))
-

From: Linus Torvalds
Date: Saturday, August 4, 2007 - 10:03 pm

Do "git status -a" to figure out the removed ones.

I actually think we should probably make "git add ." do it too, but it's 
not how we've done it historically ("git add ." really ends up just 
reading the working directory tree and addign all the files it sees: so by 
definition it doesn't do anything at all to files it does *not* see, ie 

Well, it's just "behaviour". It's probably largely historical, in that 
"git add" used to be thought of as "adding new files", but obviously then 
it got extended to mean "stage old files for commit" too, but in that 
extension, the "remove old files" never came up.

But git certainly has the capability. "git commit -a" will notice all the 
files that went away and automatically remove them, so

	git add .
	git commit -a

will do what you want (except, as we found out last week, we've had a huge 
performance regression, so that's actually a really slow way to do it, and 
so it's actually faster to do

	git ls-files -o | git update-index --add --stdin
	git commit -a

instead (where the first one just adds the *new* files, and then obviously 
the "git commit -a" does the right thing for old files, whether deleted or 
modified)

		Linus
-

From: Junio C Hamano
Date: Saturday, August 4, 2007 - 10:14 pm

Is it still the case after the fix in rc4?  Other than the
theoretical "on multi-core, ls-files and update-index can run in
parallel" performance boost potential, I thought the fixed
"git-add ." would be the same...

-

From: David Kastrup
Date: Sunday, August 5, 2007 - 12:32 am

When I did my apprenticeship, one thing I learnt was that to
accomplish a repetitive task comprised of several steps, you organize
it in a way that does not require you to change the tool you are
holding/using until you have finished using it.

What's good for the user is good for the computer: even on single core
systems, working off a complete pipeline buffer before switching
context again will help keeping disk positioning and cache poisoning
down.  However, it will depend on the scheduler: if it never allows
pipes to even slighly fill up (which has been the normal behavior of
the Linux scheduler for years in spite of complaints I voiced several
times), you don't get the advantages from this sort of processing.
CFS could conceivably help in many use cases since then the context
switch depends on more than just "pipe has some minimal content?"
which is pretty much the worst choice for context switches in batch
processing.  However, as long as we are talking buffered I/O (FILE*
and block buffering), we are losing some parallelism potential and

Possibly.  After all, there _is_ overhead associated with pipes, and
currently released kernels' scheduling behavior reaps no cache
poisoning gains.  Whatever.  I think I'll do a large test.
Speculation is not everything.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum
-

From: David Kastrup
Date: Sunday, August 5, 2007 - 3:33 am

dak@lola:/home/tmp/texlive$ git-init
Initialized empty Git repository in .git/
dak@lola:/home/tmp/texlive$ time git --work-tree=/usr/local/texlive/2007/texmf-dist add .

real    9m36.256s
user    2m2.408s
sys     0m25.874s
dak@lola:/home/tmp/texlive$ time git --work-tree=/usr/local/texlive/2007/texmf-dist add .

real    0m34.161s
user    0m0.448s
sys     0m2.212s

[So the rc4 fix seems to have made it.]

dak@lola:/home/tmp/texlive$ rm -rf .git;git-init
Initialized empty Git repository in .git/

dak@lola:/home/tmp/texlive$ time git --work-tree=/usr/local/texlive/2007/texmf-dist ls-files -z -m -o .|(cd /usr/local/texlive/2007/texmf-dist;git --git-dir=/home/tmp/texlive/.git update-index --add -z --stdin)

real    8m9.370s
user    2m1.172s
sys     0m25.138s
dak@lola:/home/tmp/texlive$ time git --work-tree=/usr/local/texlive/2007/texmf-dist ls-files -z -m -o .|(cd /usr/local/texlive/2007/texmf-dist;git --git-dir=/home/tmp/texlive/.git update-index --add -z --stdin)

real    6m4.447s
user    0m16.801s
sys     0m12.333s
dak@lola:/home/tmp/texlive$ 

[Hm.  Maybe "modified" files are not what I think they are?]

dak@lola:/home/tmp/texlive$ time git --work-tree=/usr/local/texlive/2007/texmf-dist ls-files -z -o .|(cd /usr/local/texlive/2007/texmf-dist;git --git-dir=/home/tmp/texlive/.git update-index --add -z --stdin)

real    6m0.120s
user    0m16.977s
sys     0m12.653s

[No, doesn't help.]

[Just for kicks, let's try getting the Linux scheduler out of our hair
in the initial case.]

dak@lola:/home/tmp/texlive$ time git --work-tree=/usr/local/texlive/2007/texmf-dist ls-files -z -m -o .|dd bs=8k|(cd /usr/local/texlive/2007/texmf-dist;git --git-dir=/home/tmp/texlive/.git update-index --add -z --stdin)
201+1 records in
201+1 records out
1650230 bytes (1.7 MB) copied, 513.125 seconds, 3.2 kB/s

real    8m45.088s
user    2m2.052s
sys     0m25.870s

[Hm, does more damage than it helps.]

So in summary: git-ls-files -m is apparently lacking the optimization
of git-add ...
From: Miles Bader
Date: Sunday, August 5, 2007 - 12:34 am

I notice that "git ls-files -o" doesn't do normal ignore-processing, so
for instance all my .o and editor backup files show up in the output...
Is that expected or is it a bug (I tried versions "1.5.2.4" and
"1.5.3.rc3.91.g5c75-dirty")?

If I do:

   git-ls-files -o --exclude-per-directory=.gitignore --exclude-from=$HOME/.gitignore

it works more like I'd expect.

Thanks,

-Miles

-- 
`The suburb is an obsolete and contradictory form of human settlement'
-

From: Linus Torvalds
Date: Sunday, August 5, 2007 - 10:04 am

It's expected (I just didn't try the command line I gave you).

"git ls-files" is low-level plumbing, and those things generally do only 
what you ask from them and never anything user-friendly. In particular, 
they tend to avoid policy decisions. An example of this is "git diff" that 
colorizes the output by default as you have specified, but "git diff-tree" 
that does not.

			Linus
-

Previous thread: git-diff new files (without using index) by Miles Bader on Saturday, August 4, 2007 - 8:42 pm. (6 messages)

Next thread: [PATCH] Add 'test-absolute-path' to .gitignore by Johannes Schindelin on Saturday, August 4, 2007 - 10:14 pm. (1 message)