i think your theory should be easy to test: Yanmin, could you turn on
CONFIG_MUTEX_DEBUG=y and check by how much AIM7 regresses?
Because in the CONFIG_MUTEX_DEBUG=y case the mutex debug code does
exactly that: it doesnt use the single-instruction fastpath [it uses
asm-generic/mutex-null.h] but always drops into the slowpath (to be able
to access debug state). That debug code is about as expensive as the
generic semaphore code's current fastpath. (perhaps even more
expensive.)
There's far more normal mutex fastpath use during an AIM7 run than any
BKL use. So if it's due to any direct fastpath overhead and the
resulting widening of the window for the real slowdown, we should see a
severe slowdown on AIM7 with CONFIG_MUTEX_DEBUG=y. Agreed?
Ingo
--