On Jan 16, 2008, at 6:38 PM, Linus Torvalds wrote:
My understanding is that normalization is there to help the computer. =20=
That doesn't give it any semantic meaning, because all normal forms of =20=
a given string still represent the exact same string to the user.
The argument for case insensitivity is different than the argument for =20=
normalization. I certainly hope you understand why they are different =20=
arguments, or there's really no point in going further.
You're right, sometimes the sequence matters. As in key sequences. But =20=
we're not talking about key sequences, we're talking about strings. =20
Just because it matters sometimes doesn't mean it matters all the time.
providing a
And how am I supposed to use the same sequence everywhere? When I type =20=
"M=E4rchen", I don't know which form I'm typing, nor should I. It's not =20=
something that I, as a user, should have to know. Especially if I pass =20=
this name through various other utilities before using it - I have no =20=
idea if another utility is going to end up normalizing the name, and =20
it shouldn't matter, as they are equivalent strings.
On a US keyboard I only have one way of typing =E4, and I have no idea =20=
whether it ends up precomposed or decomposed in the resulting byte =20
stream. And I don't care. Because I'm typing characters, not bytes. I =20=
could be typing in a file in ISO-Latin-1 and I still wouldn't care, =20
because it looks the same to me. If my filesystem did make a =20
distinction between the normal forms, and I see that I have a file =20
named "M=E4rchen", how am I supposed to type that at my keyboard? I =20
don't know which normal form it's using.
The fact that you think the normalization of the string matters, I =20
don't understand.
What a fabulous straw man argument you just put together. I hope you =20
don't need me to point out why this argument is fundamentally flawed.
I'm speaking as a user, and as such, I shouldn't even have to know =20
that it's possible to write the same character in multiple different =20
ways. As a user, HFS+ behaves exactly the way I want it to. You were =20
talking earlier about not messing with the "user data", but what is =20
the "user data"? It's the string, not the byte sequence. That's all I =20=
care about - the string. That's all the OS cares about, that's all any =20=
application I use cares about, and that's all git should care about.
-Kevin Ballard
--=20
Kevin Ballard
http://kevin.sb.orgkevin@sb.orghttp://www.tildesoft.com