Umm. What's this inability to see that data is data is data?
Why do you think Unicode has anything in particular to do with filenames?
Those same unicode strings are often part of the file data itself, and
then that encoding damn well is visible in "ls -l".
Doing
echo ä > file
ls -l file
sure shows that "underlying octet" thing that you wanted to avoid so much.
My point was that those underlying octets are always there, and they do
matter. The fact that the differences may not be visible when you compare
the normalized forms doesn't make it any less true.
You can choose to put blinders on and try to claim that normalization is
invisible, but it's only invisible TO THOSE THINGS THAT DON'T WANT TO SEE
IT.
But that doesn't change the fact that a lot of things *do* see it. There
are very few things that are "Unicode specific", and a *lot* of tools that
are just "general data tools".
And git tries to be a general data tool, not a Unicode-specific one.
The problem is that the UTF-8 form is different, so if you save things in
UTF-8 (which we hopefully agree is a sane thing to do), then you should
try to use a representation that people agree on.
And NFC is the more common normalization form by far, so by normalizing to
something else, you actually de-normalize as far as those other people are
concerned.
So if you have to normalize, at least use the normal form!
I blame them for encouraging normalization at all.
It's stupid.
You don't need it.
The people who care about "are these strings equivalent" shouldn't do a
"memcmp()" on them in the first place. And if you don't do a memcmp() on
things, then you don't need to normalize.
So you have two cases:
(a) the cases that care about *identity*. They don't want normalization
(b) the cases that care about *equivalence*. And they shouldn't do
octet-by-octet comparison.
See? Either you want to see equivalence, or you don't. And in neither case
is normalization the right thing to do (except as *possibly* an internal
part of the comparison, but there are actually better ways to check for
equivalence than the brute-force "normalize both and compare results
bitwise").
Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html