On Mon, Jan 21, 2008 at 10:12:01AM -0800, Linus Torvalds wrote:
quoted text >
>
> On Mon, 21 Jan 2008, Kevin Ballard wrote:
> > On Jan 21, 2008, at 9:14 AM, Peter Karlsson wrote:
> > >
> > > I happen to prefer the text-as-string-of-characters (or code points,
> > > since you use the other meaning of characters in your posts), since I
> > > come from the text world, having worked a lot on Unicode text
> > > processing.
> > >
> > > You apparently prefer the text-as-sequence-of-octets, which I tend to
> > > dislike because I would have thought computer engineers would have
> > > evolved beyond this when we left the 1900s.
> >
> > I agree. Every single problem that I can recall Linus bringing up as a
> > consequence of HFS+ treating filenames as strings [..]
>
> You say "I agree", BUT YOU DON'T EVEN SEEM TO UNDERSTAND WHAT IS GOING ON.
>
> The fact is, text-as-string-of-codepoints (let's make the "codepoints"
> obvious, so that there is no ambiguity, but I'd also like to make it clear
> that a codepoint *is* how a Unicode character is defined, and a Unicode
> "string" is actually *defined* to be a sequence of codepoints, and totally
> independent of normalization!) is fine.
>
> That was never the issue at all. Unicode codepoints are wonderful.
>
> Now, git _also_ heavily depends on the actual encoding of those
> codepoints, since we create hashes etc, so in fact, as far ass git is
> concerned, names have to be in some particular encoding to be hashed, and
> UTF-8 is the only sane encoding for Unicode. People can blather about
> UCS-2 and UTF-16 and UTF-32 all they want, but the fact is, UTF-8 is
> simply technically superior in so many ways that I don't even understand
> why anybody ever uses anything else.
Maybe because it's 1.5 times bigger for any text in chinese, japanese or
korean ?
Mike
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to
majordomo@vger.kernel.org
More majordomo info at
http://vger.kernel.org/majordomo-info.html