[PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts

Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
From: David Dillow
Date: Friday, May 22, 2009 - 6:29 pm

The 8169 chip only generates MSI interrupts when all enabled event
sources are quiescent and one or more sources transition to active. If
not all of the active events are acknowledged, or a new event becomes
active while the existing ones are cleared in the handler, we will not
see a new interrupt.

The current interrupt handler masks off the Rx and Tx events once the
NAPI handler has been scheduled, which opens a race window in which we
can get another Rx or Tx event and never ACK'ing it, stopping all
activity until the link is reset (ifconfig down/up). Fix this by always
ACK'ing all event sources, and loop in the handler until we have all
sources quiescent.

Signed-off-by: David Dillow <dave@thedillows.org>
---
This fixes the lockups I've seen. Both MSI and level-triggered interrupt
configurations survive over an hour of testing when it would lockup in
under 90 seconds before. I am certain of the analysis of the root cause,
but there may be better ways to fix it. There may also be a theoretical
race window between the ending of a NAPI poll cycle and a link change
interrupt coming in, but I'm not sure it would matter. 

Some variant of this should also be applied to the currently running
stable trees, as the problem is long-standing.

 r8169.c |  102 +++++++++++++++++++++++++++++++-------------------------
 1 file changed, 57 insertions(+), 45 deletions(-)

diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index 0b6e8c8..bdc8d5a 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -3554,54 +3554,64 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
 	int handled = 0;
 	int status;
 
+	/* loop handling interrupts until we have no new ones or
+	 * we hit a invalid/hotplug case.
+	 */
 	status = RTL_R16(IntrStatus);
+	while (status && status != 0xffff) {
+		handled = 1;
 
-	/* hotplug/major error/no more work/shared irq */
-	if ((status == 0xffff) || !status)
-		goto out;
-
-	handled = 1;
+		/* Handle all of the error cases first. These will reset
+		 * the chip, so just exit the loop.
+		 */
+		if (unlikely(!netif_running(dev))) {
+			rtl8169_asic_down(ioaddr);
+			break;
+		}
 
-	if (unlikely(!netif_running(dev))) {
-		rtl8169_asic_down(ioaddr);
-		goto out;
-	}
+		/* Work around for rx fifo overflow */
+		if (unlikely(status & RxFIFOOver) &&
+	    	(tp->mac_version == RTL_GIGA_MAC_VER_11)) {
+			netif_stop_queue(dev);
+			rtl8169_tx_timeout(dev);
+			break;
+		}
 
-	status &= tp->intr_mask;
-	RTL_W16(IntrStatus,
-		(status & RxFIFOOver) ? (status | RxOverflow) : status);
+		if (unlikely(status & SYSErr)) {
+			rtl8169_pcierr_interrupt(dev);
+			break;
+		}
 
-	if (!(status & tp->intr_event))
-		goto out;
+		if (status & LinkChg)
+			rtl8169_check_link_status(dev, tp, ioaddr);
 
-	/* Work around for rx fifo overflow */
-	if (unlikely(status & RxFIFOOver) &&
-	    (tp->mac_version == RTL_GIGA_MAC_VER_11)) {
-		netif_stop_queue(dev);
-		rtl8169_tx_timeout(dev);
-		goto out;
-	}
+		/* We need to see the lastest version of tp->intr_mask to
+		 * avoid ignoring an MSI interrupt and having to wait for
+		 * another event which may never come.
+		 */
+		smp_rmb();
+		if (status & tp->intr_mask & tp->napi_event) {
+			RTL_W16(IntrMask, tp->intr_event & ~tp->napi_event);
+			tp->intr_mask = ~tp->napi_event;
+
+			if (likely(napi_schedule_prep(&tp->napi)))
+				__napi_schedule(&tp->napi);
+			else if (netif_msg_intr(tp)) {
+				printk(KERN_INFO "%s: interrupt %04x in poll\n",
+			       	dev->name, status);
+			}
+		}
 
-	if (unlikely(status & SYSErr)) {
-		rtl8169_pcierr_interrupt(dev);
-		goto out;
+		/* We only get a new MSI interrupt when all active irq
+		 * sources on the chip have been acknowledged. So, ack
+		 * everything we've seen and check if new sources have become
+		 * active to avoid blocking all interrupts from the chip.
+		 */
+		RTL_W16(IntrStatus,
+			(status & RxFIFOOver) ? (status | RxOverflow) : status);
+		status = RTL_R16(IntrStatus);
 	}
 
-	if (status & LinkChg)
-		rtl8169_check_link_status(dev, tp, ioaddr);
-
-	if (status & tp->napi_event) {
-		RTL_W16(IntrMask, tp->intr_event & ~tp->napi_event);
-		tp->intr_mask = ~tp->napi_event;
-
-		if (likely(napi_schedule_prep(&tp->napi)))
-			__napi_schedule(&tp->napi);
-		else if (netif_msg_intr(tp)) {
-			printk(KERN_INFO "%s: interrupt %04x in poll\n",
-			       dev->name, status);
-		}
-	}
-out:
 	return IRQ_RETVAL(handled);
 }
 
@@ -3617,13 +3627,15 @@ static int rtl8169_poll(struct napi_struct *napi, int budget)
 
 	if (work_done < budget) {
 		napi_complete(napi);
-		tp->intr_mask = 0xffff;
-		/*
-		 * 20040426: the barrier is not strictly required but the
-		 * behavior of the irq handler could be less predictable
-		 * without it. Btw, the lack of flush for the posted pci
-		 * write is safe - FR
+
+		/* We need for force the visibility of tp->intr_mask
+		 * for other CPUs, as we can loose an MSI interrupt
+		 * and potentially wait for a retransmit timeout if we don't.
+		 * The posted write to IntrMask is safe, as it will
+		 * eventually make it to the chip and we won't loose anything
+		 * until it does.
 		 */
+		tp->intr_mask = 0xffff;
 		smp_wmb();
 		RTL_W16(IntrMask, tp->intr_event);
 	}



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
[PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts, David Dillow, (Fri May 22, 6:29 pm)
Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts, Michael Buesch, (Sat May 23, 2:24 am)
Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts, Michael Riepe, (Sat May 23, 7:35 am)
Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts, Michael Buesch, (Sat May 23, 7:44 am)
Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts, Michael Riepe, (Sat May 23, 8:01 am)
Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts, Michael Riepe, (Sat May 23, 9:12 am)
Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts, Michael Buesch, (Sat May 23, 9:40 am)
Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts, Michael Buesch, (Sat May 23, 9:45 am)
Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts, Michael Buesch, (Sat May 23, 9:50 am)
Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts, Michael Riepe, (Sat May 23, 9:53 am)
Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts, David Dillow, (Sat May 23, 10:03 am)
Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts, Francois Romieu, (Sun May 24, 2:15 pm)
Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts, David Miller, (Mon May 25, 10:55 pm)
Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts, Michael Buesch, (Tue May 26, 11:22 am)
Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts, Michael Riepe, (Tue May 26, 3:40 pm)
Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts, Michael Buesch, (Wed May 27, 9:19 am)
Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts, Eric W. Biederman, (Fri Aug 21, 1:57 pm)
Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts, Michael Riepe, (Fri Aug 21, 2:22 pm)
Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts, Eric W. Biederman, (Fri Aug 21, 5:24 pm)
Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts, Eric W. Biederman, (Sat Aug 22, 4:48 am)
Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts, Eric W. Biederman, (Sat Aug 22, 5:07 am)
Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts, Jarek Poplawski, (Sun Aug 23, 10:17 am)
Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts, Michal Soltys, (Sun Aug 23, 10:43 am)
Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts, Jarek Poplawski, (Sun Aug 23, 10:54 am)
Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts, Eric W. Biederman, (Sun Aug 23, 7:37 pm)
Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts, Eric W. Biederman, (Mon Aug 24, 5:51 pm)
Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts, Eric W. Biederman, (Tue Aug 25, 1:22 pm)
Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts, Eric W. Biederman, (Tue Aug 25, 2:24 pm)
Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts, Eric W. Biederman, (Tue Aug 25, 2:37 pm)
Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts, Francois Romieu, (Tue Aug 25, 3:19 pm)
Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts, Francois Romieu, (Tue Aug 25, 4:11 pm)
Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts, Eric W. Biederman, (Tue Aug 25, 8:47 pm)
[PATCH] r8169: Reduce looping in the interrupt handler., Eric W. Biederman, (Wed Aug 26, 12:58 am)
Re: [PATCH] r8169: Reduce looping in the interrupt handler., Eric W. Biederman, (Wed Aug 26, 1:02 pm)
Re: [PATCH] r8169: Reduce looping in the interrupt handler., Francois Romieu, (Wed Aug 26, 2:30 pm)
Re: [PATCH] r8169: Reduce looping in the interrupt handler., Eric W. Biederman, (Wed Aug 26, 2:40 pm)
Re: [PATCH] r8169: Reduce looping in the interrupt handler., Francois Romieu, (Wed Aug 26, 10:24 pm)
Re: [PATCH] r8169: Reduce looping in the interrupt handler., Eric W. Biederman, (Wed Aug 26, 10:38 pm)
Re: [PATCH] r8169: Reduce looping in the interrupt handler., Francois Romieu, (Thu Aug 27, 4:20 pm)
Re: [PATCH] r8169: Reduce looping in the interrupt handler., Eric W. Biederman, (Thu Aug 27, 6:17 pm)
Re: [PATCH] r8169: Reduce looping in the interrupt handler., Francois Romieu, (Sun Aug 30, 1:37 pm)
Re: [PATCH] r8169: Reduce looping in the interrupt handler., Eric W. Biederman, (Sun Aug 30, 1:53 pm)