Re: [PATCH / RFC] net: don't grab a mutex within a timer context in gianfar

Previous thread: [PATCH] igb/ixgbe/e1000e: resolve tx multiqueue bug by Jeff Kirsher on Friday, July 18, 2008 - 4:02 am. (2 messages)

Next thread: [patch 1/3] Cleanup claw printk messages. by Ursula Braun on Friday, July 18, 2008 - 6:24 am. (2 messages)
From: Sebastian Siewior
Date: Friday, July 18, 2008 - 5:10 am

Commit 35b5f6b1a aka [PHYLIB: Locking fixes for PHY I/O potentially sleeping]
changed the phydev->lock from spinlock into a mutex. Now, the following
code path got triggered while NFS was unavailable:

|[   21.287359] nfs: server 10.11.3.47 not responding, still trying
|[   38.891373] nfs: server 10.11.3.47 not responding, still trying
|[  148.179592] INFO: task udevd:1762 blocked for more than 120 seconds.
|[  148.185967] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
|[  148.193810] udevd         D 0fef1dd8     0  1762   1761
|[  148.199055] Call Trace:
|[  148.201504] [cecdda80] [c00071e4] __switch_to+0x6c/0x84
|[  148.206764] [cecddaa0] [c025973c] schedule+0x46c/0x4cc
|[  148.211937] [cecddad0] [c00f3d84] nfs_wait_schedule+0x24/0x38
|[  148.217712] [cecddae0] [c0259b74] __wait_on_bit_lock+0x68/0xcc
|[  148.223576] [cecddb00] [c0259c4c] out_of_line_wait_on_bit_lock+0x74/0x88
|[  148.230300] [cecddb50] [c00f3e6c] __nfs_revalidate_inode+0xd4/0x264
|[  148.236597] [cecddc20] [c00f1298] nfs_lookup_revalidate+0x1bc/0x3d4
|[  148.243071] [cecddd80] [c0081db8] do_lookup+0x148/0x1a0
|[  148.248361] [cecdddb0] [c0083bac] __link_path_walk+0x930/0xe24
|[  148.254219] [cecdde00] [c00840e8] path_walk+0x48/0xa8
|[  148.259293] [cecdde30] [c008442c] do_path_lookup+0x160/0x194
|[  148.264982] [cecdde60] [c0084fe0] __path_lookup_intent_open+0x58/0xa4
|[  148.271444] [cecdde80] [c007ea54] open_exec+0x2c/0xdc
|[  148.276525] [cecddef0] [c007efa4] do_execve+0x58/0x1c4
|[  148.281704] [cecddf20] [c0007568] sys_execve+0x58/0x84
|[  148.286873] [cecddf40] [c000df58] ret_from_syscall+0x0/0x3c
|[  169.651632] INFO: task udevsettle:1053 blocked for more than 120 seconds.

some more of this and now the interresting part:

|[  194.859659] NETDEV WATCHDOG: eth0: transmit timed out
|[  194.864733] BUG: sleeping function called from invalid context at /home/bigeasy/git/linux-2.6-powerpc/kernel/mutex.c:87
|[  194.875529] in_atomic():1, irqs_disabled():0
|[  194.879805] Call ...
From: Nate Case
Date: Monday, July 21, 2008 - 3:57 pm

Hmm..  I'm not sure what the best solution is to this.  Make the
stop_gfar() call happen in a workqueue, and make a similar change to
ucc_geth, fec_mpc52xx, and fs_enet? Modify phy_stop() to do the work in
a workqueue conditionally if in interrupt context?  Between these two
I'd lean toward the latter.

Does anyone have any better ideas?

-- 
Nate Case <ncase@xes-inc.com>

--

From: Sebastian Siewior
Date: Tuesday, July 22, 2008 - 1:59 pm

If I look at tg3.c than exactly this is done. Others call it only on
close(). I guess this depends very much on driver's logic :)
If nobody minds, than I would assume that tg3.c is a good example and I
would move the timout path into a workqueu.

Sebastian
--

From: Sebastian Siewior
Date: Wednesday, July 23, 2008 - 1:03 pm

From: Sebastian Siewior <bigeasy@linutronix.de>

I got the following backtrace while network was unavailble:

|NETDEV WATCHDOG: eth0: transmit timed out
|BUG: sleeping function called from invalid context at /home/bigeasy/git/linux-2.6-powerpc/kernel/mutex.c:87
|in_atomic():1, irqs_disabled():0
|Call Trace:
|[c0383d90] [c0006dd8] show_stack+0x48/0x184 (unreliable)
|[c0383db0] [c001e938] __might_sleep+0xe0/0xf4
|[c0383dc0] [c025a43c] mutex_lock+0x24/0x3c
|[c0383de0] [c019005c] phy_stop+0x20/0x70
|[c0383df0] [c018d4ec] stop_gfar+0x28/0xf4
|[c0383e10] [c018e8c4] gfar_timeout+0x30/0x60
|[c0383e20] [c01fe7c0] dev_watchdog+0xa8/0x144
|[c0383e30] [c002f93c] run_timer_softirq+0x148/0x1c8
|[c0383e60] [c002b084] __do_softirq+0x5c/0xc4
|[c0383e80] [c00046fc] do_softirq+0x3c/0x54
|[c0383e90] [c002ac60] irq_exit+0x3c/0x5c
|[c0383ea0] [c000b378] timer_interrupt+0xe0/0xf8
|[c0383ec0] [c000e5ac] ret_from_except+0x0/0x18
|[c0383f80] [c000804c] cpu_idle+0xcc/0xdc
|[c0383fa0] [c025c07c] etext+0x7c/0x90
|[c0383fc0] [c0338960] start_kernel+0x294/0x2a8
|[c0383ff0] [c00003dc] skpinv+0x304/0x340
|------------[ cut here ]------------

The phylock was once a spinlock but got changed into a mutex via
commit 35b5f6b1a aka [PHYLIB: Locking fixes for PHY I/O potentially sleeping]

Signed-off-by: Sebastian Siewior <bigeasy@linutronix.de>
---
bug report @ http://marc.info/?l=linux-netdev&m=121638307116389&w=2
I moved it into a workqueue, this is what tg3 does.
I would convert the other three drivers unless $dude suggests a better
method or somebody else takes care....

 drivers/net/gianfar.c |   22 ++++++++++++++++++----
 drivers/net/gianfar.h |    2 ++
 2 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c
index 25bdd08..caa6cbd 100644
--- a/drivers/net/gianfar.c
+++ b/drivers/net/gianfar.c
@@ -112,6 +112,7 @@ const char gfar_driver_version[] = "1.3";
 
 static int gfar_enet_open(struct net_device *dev);
 static int ...
From: Nate Case
Date: Friday, July 25, 2008 - 7:16 am

This looks good to me.

-- 
Nate Case <ncase@xes-inc.com>

--

From: Andy Fleming
Date: Friday, July 25, 2008 - 12:02 pm

Looks good to me.  Thanks for taking care of this.

Acked-by: Andy Fleming <afleming@freescale.com>
--

From: Benjamin Herrenschmidt
Date: Wednesday, July 23, 2008 - 3:12 pm

Move the reset task to a workqueue.

Cheers,
Ben.


--

From: Sebastian Siewior
Date: Thursday, July 24, 2008 - 12:27 am

Done in [1] Ben.


Sebastian
--

From: Wolfram Sang
Date: Tuesday, July 22, 2008 - 12:54 am

Hi,


Recently, I described a (I think) similar problem:
(http://ozlabs.org/pipermail/linuxppc-dev/2008-July/059686.html)

=3D=3D=3D

Hello,

today, I was debugging a kernel crash on a board with a MPC5200B using
2.6.26-rc9. I found the following code in drivers/net/fec_mpc52xx.c:

static irqreturn_t mpc52xx_fec_interrupt(int irq, void *dev_id)
{
[...]
	/* on fifo error, soft-reset fec */
	if (ievent & (FEC_IEVENT_RFIFO_ERROR | FEC_IEVENT_XFIFO_ERROR)) {

		if (net_ratelimit() && (ievent & FEC_IEVENT_RFIFO_ERROR))
			dev_warn(&dev->dev, "FEC_IEVENT_RFIFO_ERROR\n");
		if (net_ratelimit() && (ievent & FEC_IEVENT_XFIFO_ERROR))
			dev_warn(&dev->dev, "FEC_IEVENT_XFIFO_ERROR\n");

		mpc52xx_fec_reset(dev);

		netif_wake_queue(dev);
		return IRQ_HANDLED;
	}
[...]
}

Calling mpc52xx_fec_reset() from interrupt context is bad, at least
because

a) it calls phy_write, which contains BUG_ON(in_interrupt())
b) it calls mpc52xx_fec_hw_init, which has a delay-loop to check
   if the reset was successful (1..50 us)

I assume the proper thing to do is to set a flag in the ISR and handle
the soft reset later in some other context. Having never dealt with the
network core and its drivers so far, I am not sure which place would be
the right one to perform the soft reset. To not make things worse, I
hope people with more insight to network stuff can deliver a suitable
solution to this problem.

All the best,

   Wolfram

=3D=3D=3D

--=20
  Dipl.-Ing. Wolfram Sang | http://www.pengutronix.de
 Pengutronix - Linux Solutions for Science and Industry
Previous thread: [PATCH] igb/ixgbe/e1000e: resolve tx multiqueue bug by Jeff Kirsher on Friday, July 18, 2008 - 4:02 am. (2 messages)

Next thread: [patch 1/3] Cleanup claw printk messages. by Ursula Braun on Friday, July 18, 2008 - 6:24 am. (2 messages)