Some (very few) x86 uarchs do tend to prefer "load-store" like code
generation, and doing a "mov [mem],reg + op reg" instead of "op [mem]" can
actually be faster on some of them. Not any that are relevant today,
though.
Also, that has nothing to do with volatile, and should be controlled by
optimization flags (like -mtune). In fact, I thought there was a separate
flag to do that (ie something like "-mload-store"), but I can't find it,
so maybe that's just my fevered brain..
Linus
-