An interesting (but perhaps difficult to achieve) optimization would be
to spin in userspace.
How many cores (or hardware threads) does this machine have? At 10%
duty cycle you have 25 waiters behind the lock on average. I don't
think this is realistic, and it means that spinning is invoked only rarely.
I'd be interested in seeing runs where the average number of waiters is
0.2, 0.5, 1, and 2, corresponding to moderate-to-bad contention. 25
average waiters on compute bound code means the application needs to be
rewritten, no amount of mutex tweaking will help it.
Does the wakeup code select the spinning waiter, or just a random waiter?
Do not meddle in the internals of kernels, for they are subtle and quick to panic.