>>> On 12/23/2010 at 11:54 PM, in message
<1293166464.22802.415.camel@gandalf.stny.rr.com>, Steven Rostedt
<rostedt@goodmis.org> wrote:
Well, I think that would be a good datapoint and is one of the things I'd like to see.
This is why I am skeptical. You are essentially asserting there are two issues here, IIUC:
1) The intent of avoiding a wakeup is broken and we take the double whammy of a mb()
plus the wakeup() anyway.
2) mb() is apparently slower than wakeup().
I agree (1) is plausible, though I would like to see the traces to confirm. Its been a long time
since I looked at that code, but I think the original code either ran in RUNNING_MUTEX and was
inadvertently broken in the mean time or the other cpu would have transitioned to RUNNING on
its own when we flipped the owner before the release-side check was performed. Or perhaps
we just plain screwed this up and it was racy ;) I'm not sure. But as Peter (M) stated, it seems
like a shame to walk away from the concept without further investigation. I think everyone can
agree that at the very least, if it is in fact taking a double whammy we should fix that.
For (2), I am skeptical in two parts ;). You stated you thought mb() was just as expensive as a
wakeup which seems suspect to me, given a wakeup needs to be a superset of a barrier
II[R|U]C. Lets call this "2a". In addition, your results when you removed the logic and went
straight to a wakeup() and found dbench actually was faster than the "fixed mb()" path would
imply wakeup() is actually _faster_ than mb(). Lets call this "2b".
For (2a), I would like to see some traces that compare mb() to wakeup() (of a presumably
already running task that happens in the INTERRUPTIBLE state) to be convinced that wakeup() is
equal/faster. I suspect it isn't
For (2b), I would suggest that we don't rely on dbench alone in evaluating the merit of the
change. In some ways, its a great test for this type of change since it leans heavily on the coarse
VFS locks. However, dbench is also pretty odd and thrives on somewhat chaotic behavior. For
instance, it loves the "lateral steal" logic, even though this patch technically breaks fairness. So
I would therefore propose a suite of benchmarks known for creating as much lock contention as
possible should be run in addition to dbench alone.
Happy new year, all,
-Greg
--