Could you point these places out? All uses of sched_clock() that I
could see in kernel/sched.c seemed to be related to working out how long
something spent executing, either in the scheduler proper, or
benchmarking cache characteristics.
Yes. Xen, at least, provides nanosecond resolution information about
how long each vcpu spent in its various states. But the question is how
this information should be exposed to the scheduler. I could provide a
raw dump of the info, but in general the scheduler doesn't care and
other hypervisors might not be able to produce the same information.
The essential information is "how long did process X actually run on a
real CPU"? And that, as far as I can tell, is the question
sched_clock() is already designed to answer.
No, I'm talking about cpu speed changes as a completely separate case,
which is primarily an issue while running a kernel on bare hardware.
But it is, in some ways, more complex than running on a hypervisor.
There are numerous mechanisms for cpu speed control, some kernel driven,
some autonomous, some stepwise, some continuous. I'm arguing that its
the cpufreq subsystem's job to keep track of all that detail, but the
only information it needs to provide to the scheduler is, again, "how
much work did my process get done on the CPU"?