Re: git on MacOSX and files with decomposed utf-8 file names

Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
From: Linus Torvalds
Date: Thursday, January 17, 2008 - 9:43 am

On Thu, 17 Jan 2008, Wincent Colaiuta wrote:

Side note: the thing is, the reason people shouldn't worry about it is 
that this is a *trivial* thing to handle. You really don't even need to 
know what you're doing. And you can test it today, easily.

Having two (differently encoded) files like that is really no different 
from the traditional UNIX FAQ of "how do I remove a file starting with 
'-'" or even more closely "how do I remove a file that has a character in 
it that I cannot get at the keyboard".

In other words, on a bog-standard UNIX (and yes, in this case, I bet OS X 
works fine too for this test), just try this

	filename1=$(echo -e "hello\002there")
	filename2=$(echo -e "hello\003there")
	echo Odd file > "$filename1"
	echo Another odd file > "$filename2"

and now you have a filename that is actually rather hard to type on the 
command line. In fact, for me they even *look* the same:

	[torvalds@woody ~]$ ll hello*
	-rw-rw-r-- 1 torvalds torvalds  9 2008-01-17 08:23 hello?there
	-rw-rw-r-- 1 torvalds torvalds 17 2008-01-17 08:23 hello?there

See?

Even in my graphical browser, those two filenames look 100% *identical*. I 
could give you a screen-shot, but I'm lazy. Just take my word for it, or 
just fire up konqueror on Linux (but it may well depend on the particular 
font you're using).

[ And yes, for other browsers, you might have something that shows them as 
  different characters - depending on the font, it might show up as a 
  small box with [00 02] vs [00 03] in it, for example. But that's also 
  actually 100% true of the two different encodings of 'ä' - you could 
  easily have a file broswer that shows the multi-character as a 
  multi-character, exactly to distinguish them and show that one of them 
  isn't "normalized"!

  The point is, once the filesystem doesn't corrupt the data, it's always 
  easy to get at, and there is never any ambiguity. ]

How is this different from "Märchen" spelled with two different encodings 
for that "ä"?

I'll tell you: it's not at all different. It's 100% the exact same issue.

And does that make you perhaps go "Hunh? How do I remove it, or open it?"

And the fact is, those "idential looking" filenames (and thus they must be 
the same, and something should have normalized them to the same thing, 
no?) are obviously two different files, and they are *really*easy* to edit 
and look at.

Fire up that graphical browser again, and it doesn't even matter whether 
the filename looks identical or not, it shows up as two different files, 
and you can drag them around independently, rename them there, and at 
least my file browser shows clearly which is which, because I get a small 
icon with a preview in it, so I directly see which one is the "Odd file" 
and which one is the "Another odd file".

So the whole "but they _look_ the same" argument is just total BS. In just 
about all character encodings there has always been unique and different 
"characters" that _look_ the same on screen, and it has never really made 
them actually *be* the same, and it has never been a valid argument for 
them being considered the same.

Because even when they *look* the same, that file browser that didn't show 
the difference in names visually, still showed them correctly as two 
separate files, and I could still just rename them by hand by 
right-clicking on them and picking "rename". 

So "look the same" is really not a new thing, nor is it even a really hard 
thing. Yes, people can get confused by it, but hey, people can get 
confused by *anything*. People get confused by filenames starting with a 
"-", yet nobody sane really says that filenames cannot start with a dash.

			Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
Re: git on MacOSX and files with decomposed utf-8 file names, Johannes Schindelin, (Wed Jan 16, 8:34 am)
Re: git on MacOSX and files with decomposed utf-8 file names, Johannes Schindelin, (Wed Jan 16, 9:32 am)
Re: git on MacOSX and files with decomposed utf-8 file names, Johannes Schindelin, (Wed Jan 16, 3:23 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Eyvind Bernhardsen, (Wed Jan 16, 3:37 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Wincent Colaiuta, (Wed Jan 16, 4:03 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Johannes Schindelin, (Wed Jan 16, 5:33 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Johannes Schindelin, (Wed Jan 16, 5:35 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Wincent Colaiuta, (Wed Jan 16, 5:54 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Johannes Schindelin, (Wed Jan 16, 5:57 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Johannes Schindelin, (Wed Jan 16, 6:08 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Martin Langhoff, (Wed Jan 16, 9:51 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Linus Torvalds, (Wed Jan 16, 10:11 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Junio C Hamano, (Wed Jan 16, 10:15 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Mitch Tishmack, (Thu Jan 17, 12:11 am)
Re: git on MacOSX and files with decomposed utf-8 file names, Wincent Colaiuta, (Thu Jan 17, 3:08 am)
Re: git on MacOSX and files with decomposed utf-8 file names, Wincent Colaiuta, (Thu Jan 17, 3:22 am)
Re: git on MacOSX and files with decomposed utf-8 file names, Wincent Colaiuta, (Thu Jan 17, 3:28 am)
Re: git on MacOSX and files with decomposed utf-8 file names, Johannes Schindelin, (Thu Jan 17, 4:10 am)
Re: git on MacOSX and files with decomposed utf-8 file names, Wincent Colaiuta, (Thu Jan 17, 4:46 am)
Re: git on MacOSX and files with decomposed utf-8 file names, Wincent Colaiuta, (Thu Jan 17, 4:51 am)
Re: git on MacOSX and files with decomposed utf-8 file names, Johannes Schindelin, (Thu Jan 17, 5:53 am)
Re: git on MacOSX and files with decomposed utf-8 file names, Johannes Schindelin, (Thu Jan 17, 6:05 am)
Re: git on MacOSX and files with decomposed utf-8 file names, Wincent Colaiuta, (Thu Jan 17, 6:40 am)
Re: git on MacOSX and files with decomposed utf-8 file names, Johannes Schindelin, (Thu Jan 17, 8:57 am)
Re: git on MacOSX and files with decomposed utf-8 file names, Linus Torvalds, (Thu Jan 17, 9:43 am)
Re: git on MacOSX and files with decomposed utf-8 file names, Johannes Schindelin, (Thu Jan 17, 11:18 am)
Re: git on MacOSX and files with decomposed utf-8 file names, Linus Torvalds, (Thu Jan 17, 11:42 am)
Re: git on MacOSX and files with decomposed utf-8 file names, Linus Torvalds, (Thu Jan 17, 11:44 am)
Re: git on MacOSX and files with decomposed utf-8 file names, Linus Torvalds, (Thu Jan 17, 12:11 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Johannes Schindelin, (Thu Jan 17, 3:09 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Robin Rosenberg, (Thu Jan 17, 5:44 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Robin Rosenberg, (Thu Jan 17, 6:05 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Robin Rosenberg, (Thu Jan 17, 6:27 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Robin Rosenberg, (Fri Jan 18, 2:42 am)
Re: git on MacOSX and files with decomposed utf-8 file names, Linus Torvalds, (Fri Jan 18, 10:11 am)
Re: git on MacOSX and files with decomposed utf-8 file names, Johannes Schindelin, (Fri Jan 18, 1:50 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Linus Torvalds, (Sat Jan 19, 11:58 am)
Re: git on MacOSX and files with decomposed utf-8 file names, Johannes Schindelin, (Sat Jan 19, 3:58 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Wincent Colaiuta, (Sat Jan 19, 5:11 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Linus Torvalds, (Sat Jan 19, 10:45 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Dmitry Potapov, (Sat Jan 19, 11:14 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Linus Torvalds, (Sat Jan 19, 11:53 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Linus Torvalds, (Sun Jan 20, 12:26 am)
Re: git on MacOSX and files with decomposed utf-8 file names, Wincent Colaiuta, (Sun Jan 20, 2:34 am)
Re: git on MacOSX and files with decomposed utf-8 file names, Johannes Schindelin, (Sun Jan 20, 6:15 am)
Re: git on MacOSX and files with decomposed utf-8 file names, Linus Torvalds, (Mon Jan 21, 11:12 am)
Re: git on MacOSX and files with decomposed utf-8 file names, Linus Torvalds, (Mon Jan 21, 11:16 am)
Re: git on MacOSX and files with decomposed utf-8 file names, Linus Torvalds, (Mon Jan 21, 12:41 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Martin Langhoff, (Mon Jan 21, 2:06 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Martin Langhoff, (Mon Jan 21, 2:17 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Martin Langhoff, (Mon Jan 21, 2:43 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Martin Langhoff, (Mon Jan 21, 3:45 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Martin Langhoff, (Mon Jan 21, 3:56 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Martin Langhoff, (Mon Jan 21, 8:21 pm)
Re: git on MacOSX and files with decomposed utf-8 file names, Eric W. Biederman, (Tue Jan 22, 7:46 pm)