Technically, that may not be quite correct.
<digression into weeds>
The RDTSC instruction will return a monotonically increasing unique
value, but the execution and retirement of the instruction are
unserialized. So technically, two simultaneous RDTSC could be issued to
multiple execution units, and they may either return the same values, or
the earlier one may stall and complete after the latter.
rdtsc
mov %eax, %ebx
mov %edx, %ecx
rdtsc
cmp %edx, %ecx
jb fail
cmp %ebx, %eax
jae fail
jmp good
fail:
int3
good:
ret
If execution of RDTSC is restricted to a single issue unit, this can
never fail. If it can be issued simultaneously in multiple units, it
can fail because register renaming may end up sorting the instruction
stream and removing dependencies so it can be executed as:
UNIT 1 UNIT 2
rdtsc rdtsc
mov %eax, %ebx (store to local %edx, %eax)
mov %edx, %ecx cmp %ebx, local %eax
(commit local %edx, %eax to
global register)
cmp %edx, %ecx
jb fail
jae fail
Both failure modes can be observed if this is indeed the case. I'm not
aware that anything is specifically done to maintain the serialization
internally, and as the architecture actually specifically states that
RDTSC is unserialized, I doubt anything to prevent this situation is done.
</digression into weeds>
However, that's not the pertinent issue. If the clock is very low res,
we don't present a higher granularity TSC to the guest.
While there are things that can be done to ensure that (add 1 for each
read, estimate with TSC..), they have problems of their own and in
generally will make things very messy.
Given the above digression, I'm not sure that any code written to run
with such guarantees is actually sound.
It is plausible, however, someone does
count of some value / (TSC2 - TSC1)
and ends up with a divide by zero. So it may be better to bump the
counter by at least one for each call.
--