Bug? FWIW, the original patch from October shows:
M changed
M M changed-again
M changed-staged
D deleted
D deleted-staged
(where changed-again has both staged changes and further changes in the
work tree).
The gap between these two are to show the rename similarity index, which
The output mimicked what was in Shawn's "repo" tool announcement IIRC.
My patch was supposed to give interested parties hint to base a patch like
Tuncer's on (I think this answers your last question, too).
--
OK, that makes more sense. And your example shows a good answer to my
earlier question: why is just sorting the output of the the three
commands (diff, diff --cached, and ls-files -o) not as nice. The answer
is that we need to actually combine lines when files are appear in
I went back and read some of the background. I think having this work
with the wt-status machinery is reasonable, then. My concerns with
Tuncer's patch are still:
- this should not be part of builtin-commit.c; it doesn't use any of
the same code except that which has already been lib-ified in
wt-status.[ch].
- I don't think the "mini" status is really related to this. The novel
thing here is collating the outputs into a single sorted list. But
the "mini" output is not about that at all:
1. It doesn't care about full output, so it should be able to exit
early from the diff, avoid rename detection, etc, so that it is
as quick as possible.
2. It doesn't collate the output at all. It is about three
separate symbols for the three separate lists.
--
Oh, sorry, I was misreading the "mini" output. I thought the three flags corresponded to the staged, unstaged, and untracked changes. But they are "unstaged or staged but added", "unstaged or staged but changed", or "untracked" (although right now the last is triggered by unmerged entries?). I honestly don't see much point in differentiating added versus changed files. Splitting it into "some things are staged" and "some things are not staged" makes more sense to me. But if you do want that distinction then an early exit from the diff is more complicated (since you might have to keep going to see if there are any of _either_ type). -Peff --
OK, I realize this is not exactly what the proposed --mini does. But
here is more along the lines of what I was thinking.
Warm cache, it runs in .042s on my git repo, about half of which is the
untracked files check. It takes about .49s on the kernel repo. The
read_directory() bit is not optimized at all, and could probably benefit
from an early return (OTOH, the worst case is still going to need to
look at every path).
I am not particularly interested in a fancy prompt myself, but maybe
this will help somebody else.
The patch relies on the index_differs_from() patch that Stephan
posted earlier today.
---
.gitignore | 1 +
Makefile | 1 +
builtin-ministatus.c | 52 ++++++++++++++++++++++++++++++++++++++++++++++++++
builtin.h | 1 +
git.c | 1 +
5 files changed, 56 insertions(+), 0 deletions(-)
diff --git a/.gitignore b/.gitignore
index 055eb54..de2249b 100644
--- a/.gitignore
+++ b/.gitignore
@@ -81,6 +81,7 @@ git-mergetool
git-mktag
git-mktree
git-name-rev
+git-ministatus
git-mv
git-notes
git-pack-redundant
diff --git a/Makefile b/Makefile
index a0ca137..9145c7b 100644
--- a/Makefile
+++ b/Makefile
@@ -559,6 +559,7 @@ BUILTIN_OBJS += builtin-merge-base.o
BUILTIN_OBJS += builtin-merge-file.o
BUILTIN_OBJS += builtin-merge-ours.o
BUILTIN_OBJS += builtin-merge-recursive.o
+BUILTIN_OBJS += builtin-ministatus.o
BUILTIN_OBJS += builtin-mv.o
BUILTIN_OBJS += builtin-name-rev.o
BUILTIN_OBJS += builtin-pack-objects.o
diff --git a/builtin-ministatus.c b/builtin-ministatus.c
new file mode 100644
index 0000000..c9f8e7f
--- /dev/null
+++ b/builtin-ministatus.c
@@ -0,0 +1,52 @@
+#include "cache.h"
+#include "diff.h"
+#include "commit.h"
+#include "revision.h"
+#include "dir.h"
+
+static int worktree_is_dirty(void)
+{
+ struct rev_info rev;
+ init_revisions(&rev, "");
+ setup_revisions(0, NULL, &rev, NULL);
+ DIFF_OPT_SET(&rev.diffopt, QUIET);
+ DIFF_OPT_SET(&rev.diffopt, ...I tried this and it did not run faster than my experiment. I had to add a missing opening curly brace in have_untracked() before it compiled. --
I'm surprised, based on the numbers you gave before. I guess your machine is just a lot slower than what I am experimenting with. I think you will have to profile to see what is taking so long, then, if you Sorry about that. I added the comment just above directly into the patch, and obviously managed to butcher the brace while I was doing it. -Peff --
I suspect that with a large tree your have_untracked() would show
unnecessary overhead from dir_add_name(), because you only want one bit of
information but there is no way to stop with "ok, we know enough". This
toy patch adds a trivial "early return" to read_directory() codepath, but
there are two sad things about it.
* In order to cheaply run "is there a single other file", you really
should scan the level you have already opened first before digging
deeper. I didn't bother because the primary use of read_directory is
the depth first traversal.
* In a cloned work tree, the tracked files and directories come early in
the physical directory and then crufts you created yourself comes at
the end in readdir() order. We tend to read a lot of tracked ones
first and the finally hit other files.
builtin-ministatus.c | 4 +++-
dir.c | 32 +++++++++++++++++++++++++++-----
dir.h | 2 ++
3 files changed, 32 insertions(+), 6 deletions(-)
diff --git a/builtin-ministatus.c b/builtin-ministatus.c
index aff9e5a..4b5a191 100644
--- a/builtin-ministatus.c
+++ b/builtin-ministatus.c
@@ -25,7 +25,7 @@ static int have_untracked(void)
read_directory(&dir, ".", "", 0, NULL);
/* XXX we are probably leaking memory from dir */
- for (i = 0; i < dir.nr; i++)
+ for (i = 0; i < dir.nr; i++) {
struct dir_entry *ent = dir.entries[i];
if (cache_name_is_other(ent->name, ent->len))
return 1;
@@ -47,6 +47,8 @@ int cmd_ministatus(int argc, const char **argv, const char *prefix)
putchar('*');
if (have_untracked())
putchar('?');
+ if (untracked_files_exist())
+ putchar('%');
return 0;
}
diff --git a/dir.c b/dir.c
index cfd1ea5..8d4fcdd 100644
--- a/dir.c
+++ b/dir.c
@@ -16,7 +16,10 @@ struct path_simplify {
static int read_directory_recursive(struct dir_struct *dir,
const char *path, const char *base, int baselen,
- int check_only, const struct path_simplify *simplify);
+ int mode, const ...I've done some measurements from within .bashrc via a custom timer() bash function called before and after git ministatus call. The box I'm testing on is a Core2Duo with 1.8GHz and 2GB of RAM. I have faster machines I can test on but do Git coding on this one. For WebKit.git this is similar to the previous patch. It does also take at least 8 or 9 seconds. That's an improvement over 8-11secs. --
My timings are exactly the same (with my have_untracked removed,
of course). Due most likely to the fact that I keep my repos clean, so
we have to look at every directory to realize there are no files.
Here are a few timings:
For the relative times of each action, I get (in the linux-2.6 tree):
- nothing (just read_cache / refresh_cache)
real 0m0.178s
user 0m0.060s
sys 0m0.116s
- just index_differs_from (minus nothing = 0.107s)
real 0m0.285s
user 0m0.140s
sys 0m0.140s
- just worktree_is_dirty (minus nothing = 0.0s)
real 0m0.178s
user 0m0.048s
sys 0m0.132s
- just untracked_files_exist (minus nothing = 0.184s)
real 0m0.362s
user 0m0.188s
sys 0m0.176s
So untracked_files is definitely the worst, but I was surprised that
the index to HEAD diff takes so long.
For just index_differs_from, gprof claims we spend most of our time in
unpack-trees stuff:
33.33 0.01 0.01 28301 0.00 0.00 unpack_callback
33.33 0.02 0.01 28301 0.00 0.00 unpack_nondirectories
33.33 0.03 0.01 26627 0.00 0.00 convert_from_disk
For just untracked_files_exist, gprof claims we spend most of our time
dealing with the index name hash (and obviously a bit of time in
excluded_1):
27.27 0.03 0.03 1479931 0.00 0.00 icase_hash
18.18 0.05 0.02 74142 0.00 0.00 excluded_1
9.09 0.06 0.01 120178 0.00 0.00 lookup_hash_entry
9.09 0.07 0.01 49855 0.00 0.00 hash_name
9.09 0.08 0.01 26627 0.00 0.00 convert_from_disk
9.09 0.09 0.01 24739 0.00 0.00 excluded
9.09 0.10 0.01 18 0.56 0.88 grow_hash_table
So maybe there are speedups to be had there. ISTR the icase_hash stuff
came about to support case-challenged filesystems. I haven't looked to
see if it could be turned into a ...There is an obvious optimization you can do to "diff-index --cached" using
cache-tree. If your index is really clean, computing the tree object the
index would represent (without writing the tree object out) and comparing
it against HEAD^{tree} may be a tad faster.
--
Clever, but I think you may just be trading one scenario for "worst
case" versus another (i.e., now when you _do_ have a difference to do an
early return, you still have to touch everything in the cache).
Just for fun, I timed a quick and dirty implementation. It looks like
generating the tree actually ends up taking just a little bit longer,
even with an unchanged index (which should be the case it speeds up).
But maybe I just did it wrong. My implementation was basically just:
t = cache_tree();
if (!cache_tree_fully_valid(t))
cache_tree_update(t, active_cache, active_nr, 0, 0);
get_sha1("HEAD", sha1);
return hashcmp(t->sha1, sha1);
-Peff
--
I have a question. Why do you have the gap for the rename similarity between the two but not between the second status and the filename? --
| Greg KH | Og dreams of kernels |
| Jens Axboe | [PATCH 31/33] Fusion: sg chaining support |
| Arnd Bergmann | Re: finding your own dead "CONFIG_" variables |
| Mark Brown | [PATCH 2/2] Subject: natsemi: Allow users to disable workaround for DspCfg reset |
| Tony Breeds | [LGUEST] Look in object dir for .config |
git: | |
| Brian Downing | Re: Git in a Nutshell guide |
| John Benes | Re: master has some toys |
| Matthias Lederhofer | [PATCH 4/7] introduce GIT_WORK_TREE to specify the work tree |
| Alexander Sulfrian | [RFC/PATCH] RE: git calls SSH_ASKPASS even if DISPLAY is not set |
| Junio C Hamano | Re: Rss produced by git is not valid xml? |
| Linux Kernel Mailing List | iSeries: fix section mismatch in iseries_veth |
| Linux Kernel Mailing List | ixbge: remove TX lock and redo TX accounting. |
| Linux Kernel Mailing List | ixgbe: fix several counter register errata |
| Linux Kernel Mailing List | b43: fix build with CONFIG_SSB_PCIHOST=n |
| Linux Kernel Mailing List | 9p: block-based virtio client |
