That would require a polymorphic macro __fls that adapts to 32bit and 64bit
arguments. Not good C style.
AFAIK it only exists because some ancient sparc chips had incredibly
slow multipliers.
I bet most different approaches who might be slightly
faster for larger bit strings would make the one bit
case slower.
-Andi
--