On Mon, Feb 11, 2008 at 04:22:11PM +1100, David Chinner wrote:
So what you want is all XFS processing (for a given filesystem,
presumably) on a limited set of cores (ideally 1) and all block
and SCSI processing (for a given device) on a similarly limited
set.
On Altix, that was far more important than having the interrupt
and issue CPU be close to the hardware -- at least with typical
LSI or Qlogic controllers where there are only one or two MMIO
reads per command issued, and completions can be stacked up.
There is still an advantage to being close to the hardware, but
a much bigger advantage to not bouncing cachelines.
Maybe what you want is a multistage completion mechanism where
each stage can run on a different CPU, if thread context switches
are cheaper than bouncing data structures around....
jeremy
--