Some (very few) x86 uarchs do tend to prefer "load-store" like code
generation, and doing a "mov [mem],reg + op reg" instead of "op [mem]" can
actually be faster on some of them. Not any that are relevant today,
Also, that has nothing to do with volatile, and should be controlled by
optimization flags (like -mtune). In fact, I thought there was a separate
flag to do that (ie something like "-mload-store"), but I can't find it,
so maybe that's just my fevered brain..