> I'm using the futex_lock branch of my futextest test suite to gather results.
> The test is performance/futex_lock.c and can be checked out here:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/dvhart/futextest.git
>
> Per Avi's suggestion I added asm instruction based period and duty-cycle options
> to do some testing. A series of plots with 256 threads, one per period length,
> is shown at the URL below. Each plot compares raw futex lock (normal, no
> spinning) with a single spinner (adaptive) and with multiple spinners (aas) for
> each of several duty-cycles to determine performance at various levels of
> contention. The test measure the number of lock/unlock pairs performed per
> second (in thousands).
>
>
http://www.kernel.org/pub/linux/kernel/people/dvhart/adaptive_futex/v4/
>
> Avi's suggested starting point was 1000 instructions with a 10% duty cycle.
> That plot "Futex Lock/Unlock (Period=1000, Threads=256)" does appear to be the
> most interesting of the lot. It's not so short that other overhead becomes the
> bottleneck, and not so long so as to make adaptive spinning benefits show up
> as noise in the numbers. The adaptive spin (with a single waiter) consistently
> beats the normal run, and outperforms aas for most duty-cycles. I rechecked a
> few points on this plot to confirm and the relative scores remained consistent.
> These plots were generated using 10,000,000 iterations per datapoint.
>
> Lastly, I should mention that these results all underperform a simple
> pthread_mutex_lock()/pthread_mutex_unlock() pair. I'm looking into why but felt
> I was overdue in sharing what I have to date. A test comparing this to a
> sched_yield() style userspace spinlock would probably be more appropraite.
>
>