I recenetly did some work on sparc64 to use cpumask pointers
as much as possible.
The only case that didn't work was due to a limitation in
arch interfaces for the new generic smp_call_function() code.
It passes a cpumask_t instead of a pointer to one via
arch_send_call_function_ipi().
But other than that, the whole sparc64 SMP stuff uses cpumask_t
pointers only.
What it comes down to is that you have to do the "self cpu"
and other tests in the cross-call dispatch routines themselves,
instead of at the top-level working on cpumask_t objects.
Otherwise you have to modify cpumask_t objects and thus pluck
them onto the stack where they take up silly amounts of space.
--