Well, the thing is, mmfile_is_binary() doesn't really have a big downside
if it's wrong one way or the other.
In contrast CR->CRLF conversion, if wrong, actually corrupts binary files.
So I felt it was better to be really safe than sorry. It's *much* better
to miss some CRLF translation than to do too much of it.
That said, I'm sure it could be improved a lot. In particular, characters
in the range 0x00 - 0x1f are clearly "more binary" than the 0x7f+ range,
with the obvious exceptions (tab, cr, lf).
0x00 - which is the only one mmfile_is_binart() uses - is arguably the
"most binary" one, of course, but it might be interesting to give
different weights to the whole range.. In particular, especially for small
files, the fact that there is no 0x00 byte in no way indicates that it's
not "binary".
This whole issue is obviously one reason I'd like to involve the filename
itself, and make it use a ".gitattributes" file - exactly because that
allows you to be much more aggressive and more precise.
(0x00 may be one of the more _common_ characters in many binary files,
which makes it a good character to search for too, so I don't really have
any hugely strong opinions here. After all, the whole heuristic is off by
default anyway, so it's "really safe" ;^)
Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html