I would not advocate a single system wide cpumask of idle CPUs. As
Peter Zijlstra notes in a follow up post, that's too hot a cache line
and clearly doesn't scale.
But I would think it would be ok to have a separate cpumask per node,
that marked just the node-local CPUs. We have other per-node data
already. If we only support this optional load balancing level across
the other CPUs on the same node (or smaller domains, such as the cores
in a package), that should work, shouldn't it?
If you see a good solution here that you can provide, good. But if my
brain storming ideas have problems, don't hesitate to object to them.
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@sgi.com> 1.940.382.4214
--