From: Nick Piggin <nickpiggin@yahoo.com.au> Date: Wed, 27 Aug 2008 17:47:14 +1000There is a lot of indirect costs that are hard to see as well. Two things a lot of these cross-call dispatch paths do is: 1) Clear self-cpu 2) AND with cpus_online #1 can normally be a simple bit clear, but some places can also implement this with something like "cpus_andn(X, cpumask_of_cpu(cpu))" It's simply easier to move those two things down to the bottom of the APIC programming code, they just loop over the cpumask doing an expensive APIC I/O operation anyways, might as well overlap it with these "skip self-cpu" and "skip not-online cpus" checks. And oh yeah we get the stack wastage fixed too, isn't what what we were talking about? :-) --
