While looking at some other issue recently, we encountered this smp_mb()
placement issue. x86 specific code also needs some similar fixes. Patch for
that will follow soon.
Please review the appended generic-ipi fix.
thanks,
suresh
---
From: Suresh Siddha <suresh.b.siddha@intel.com>
Subject: generic-ipi: fix the smp_mb() placement
smp_mb() is needed (to make the memory operations visible globally) before
sending the ipi on the sender and the receiver (on Alpha atleast) needs
smp_read_barrier_depends() in the handler before reading the call_single_queue
list in a lock-free fashion.
On x86, x2apic mode register accesses for sending IPI's don't have serializing
semantics. So the need for smp_mb() before sending the IPI becomes more
critical in x2apic mode.
Remove the unnecessary smp_mb() in csd_flag_wait(), as the presence of that
smp_mb() doesn't mean anything on the sender, when the ipi receiver is not
doing any thing special (like memory fence) after clearing the CSD_FLAG_WAIT.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
---
diff --git a/kernel/smp.c b/kernel/smp.c
index f362a85..75c8dde 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -51,10 +51,6 @@ static void csd_flag_wait(struct call_single_data *data)
{
/* Wait for response */
do {
- /*
- * We need to see the flags store in the IPI handler
- */
- smp_mb();
if (!(data->flags & CSD_FLAG_WAIT))
break;
cpu_relax();
@@ -76,6 +72,11 @@ static void generic_exec_single(int cpu, struct call_single_data *data)
list_add_tail(&data->list, &dst->list);
spin_unlock_irqrestore(&dst->lock, flags);
+ /*
+ * Make the list addition visible before sending the ipi.
+ */
+ smp_mb();
+
if (ipi)
arch_send_call_function_single_ipi(cpu);
@@ -157,7 +158,7 @@ void generic_smp_call_function_single_interrupt(void)
* Need to see other stores to list head for checking whether
* list is empty without holding q->lock
*/
- smp_mb();
+ smp_read_barrier_depends();
...No. We want the ipi receiver to see the new consistent data rather than possible old consistent data. And on x86, smp_wmb() is a simple barrier() (in !CONFIG_X86_OOSTORE) and which doesn't do much in this case. on x86 mfence (smp_mb()) will ensure that msr based APIC (x2apic) accesses (ipi) will be visible only after the memory operations before smp_mb() are made visible. thanks, suresh --
OK, I'm convinced. I'll queue up the patch, thanks! -- Jens Axboe --
nice! Did you see an actual lockup due to this? Seems like a v2.6.28 fix to me in any case. Ingo --
We didn't see the lockup in our tests but Xen folks reported similar failures Yes. thanks, suresh --
...really? I don't remember anything like that, but perhaps I'm
forgetting something. In Xen the IPI is sent with a hypercall, which is
definitely a solid enough barrier for these purposes.
J
--
i think Suresh might be referring to some of the fragilities Xen had with generic-ipi. But those AFAICT were due to the on-stack lifetime bug that Nick fixed via the kmalloc? v2.6.26-ish issue. Ingo --
Right, that's all I could think of.
J
--
No. I am referring to Xen hypervisor code fix recently done by the Xen team in the Intel. http://xenbits.xensource.com/xen-unstable.hg?rev/50170dc8649c thanks, suresh --
ok - so that makes it a v2.6.28 item i guess. Ingo --
The case Suresh is talking about was a fix to Xen itself, rather than on
the kernel side, so it doesn't need to be a .28 issue on Xen's account.
J
--
ok - but still the portion of the fix that strengthens barriers looks obvious to have and there's little downside that i can see. Suresh, you might want to split the patch(es) in two: get the barrier strengthening changes into v2.6.28 (to fix the x2apic bug), while the aspects that _weaken_ barriers can wait for v2.6.29. With that it would be a 100% safe change for v2.6.28-rc4. Ingo --
Ok. I just posted three patches (including the x86 specific change). [patch 1/3] generic-ipi: add smp_mb() before sending the IPI [patch 2/3] x86: Add smp_mb() before sending INVALIDATE_TLB_VECTOR [patch 3/3] generic-ipi: fix the smp_mb() usage First two patches are safe to go into v2.6.28. Third patch can wait for v2.6.29. thanks, suresh --
I already have the combined 1+3 patch queued up... -- Jens Axboe --
