Side note: the thing is, the reason people shouldn't worry about it is
that this is a *trivial* thing to handle. You really don't even need to
know what you're doing. And you can test it today, easily.
Having two (differently encoded) files like that is really no different
from the traditional UNIX FAQ of "how do I remove a file starting with
'-'" or even more closely "how do I remove a file that has a character in
it that I cannot get at the keyboard".
In other words, on a bog-standard UNIX (and yes, in this case, I bet OS X
works fine too for this test), just try this
filename1=$(echo -e "hello\002there")
filename2=$(echo -e "hello\003there")
echo Odd file > "$filename1"
echo Another odd file > "$filename2"
and now you have a filename that is actually rather hard to type on the
command line. In fact, for me they even *look* the same:
[torvalds@woody ~]$ ll hello*
-rw-rw-r-- 1 torvalds torvalds 9 2008-01-17 08:23 hello?there
-rw-rw-r-- 1 torvalds torvalds 17 2008-01-17 08:23 hello?there
See?
Even in my graphical browser, those two filenames look 100% *identical*. I
could give you a screen-shot, but I'm lazy. Just take my word for it, or
just fire up konqueror on Linux (but it may well depend on the particular
font you're using).
[ And yes, for other browsers, you might have something that shows them as
different characters - depending on the font, it might show up as a
small box with [00 02] vs [00 03] in it, for example. But that's also
actually 100% true of the two different encodings of 'ä' - you could
easily have a file broswer that shows the multi-character as a
multi-character, exactly to distinguish them and show that one of them
isn't "normalized"!
The point is, once the filesystem doesn't corrupt the data, it's always
easy to get at, and there is never any ambiguity. ]
How is this different from "Märchen" spelled with two different encodings
for that "ä"?
I'll tell you: it's not at all different. It's 100% the exact same issue.
And does that make you perhaps go "Hunh? How do I remove it, or open it?"
And the fact is, those "idential looking" filenames (and thus they must be
the same, and something should have normalized them to the same thing,
no?) are obviously two different files, and they are *really*easy* to edit
and look at.
Fire up that graphical browser again, and it doesn't even matter whether
the filename looks identical or not, it shows up as two different files,
and you can drag them around independently, rename them there, and at
least my file browser shows clearly which is which, because I get a small
icon with a preview in it, so I directly see which one is the "Odd file"
and which one is the "Another odd file".
So the whole "but they _look_ the same" argument is just total BS. In just
about all character encodings there has always been unique and different
"characters" that _look_ the same on screen, and it has never really made
them actually *be* the same, and it has never been a valid argument for
them being considered the same.
Because even when they *look* the same, that file browser that didn't show
the difference in names visually, still showed them correctly as two
separate files, and I could still just rename them by hand by
right-clicking on them and picking "rename".
So "look the same" is really not a new thing, nor is it even a really hard
thing. Yes, people can get confused by it, but hey, people can get
confused by *anything*. People get confused by filenames starting with a
"-", yet nobody sane really says that filenames cannot start with a dash.
Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html