Sure I do, because it matters a lot for things like - wait for it - things
like checksumming it.
I've already told you the reason: they did the mistake of wanting to be
case-independent, and a (bad) case compare is easier in NFD.
Once you give strings semantic meaning (and "case independent" implies
that semantic meaning), suddenly normalization looks like a good idea, and
since you're going to corrupt the data *anyway*, who cares? You just
created a file like "Hello", and readdir() returns "hello" (because there
was an old file under that name), and it's a lot more obviously corrupt
than just due to normalization.
.. but you *have* to look at the octets at some point. They're kind of
what the string is built up of. They never went away, even if you chose to
ignore them. The encoding is really quite important, and is visible both
in memory and on disk.
It's what shows up when you sha1sum, but it's also as simple as what shows
up when you do an "ls -l" and look at a file size.
It doesn't matter if the text is "equivalent", when you then see the
differences in all these small details.
You can shut your eyes as much as you want, and say that you don't care,
but the differences are real, and they are visible.
You're right, I messed up. I used a non-combining diaeresis, and you're
right, it doesn't get corrupted. And I think that means that if Apple had
used NFC, we'd not have this problem with Latin1 systems (because then the
UTF-8 representation would be the same).
So I still think that normalization is totally idiotic, but the thing that
actually causes most problems for people on OS X is that they chose the
really inconvenient one.
Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html