Re: notification for systemtap-related oops

Previous thread: [patch 20/22] vfs: add path_setattr() by Miklos Szeredi on Friday, May 16, 2008 - 9:31 am. (1 message)

Next thread: [PATCH 1/4] [SPI] [POWERPC] spi_mpc83xx: handles Freescale MPC8610 as well by Anton Vorontsov on Friday, May 16, 2008 - 9:50 am. (3 messages)
From: Arjan van de Ven
Date: Friday, May 16, 2008 - 9:41 am

The http://www.kerneloops.org website collects kernel oops and
warning reports from various mailing lists and bugzillas as well as
with a client users can install to auto-submit oopses.
Below is a top 10 list of the oopses collected in the last 7 days.
(Reports prior to 2.6.23 have been omitted in collecting the top 10)

This week, a total of 1617 oopses and warnings have been reported,
compared to 452 reports in the previous week. This sharp increase
is due to Fedora 9 being released, which includes the automatic
collection client.


Per file statistics
-------------------
743	kernel/sysctl.c
113	fs/buffer.c
76	fs/sysfs/dir.c
38	kernel/spinlock.c
22	fs/inotify.c
21	kernel/sysctl_check.c (P)
21	net/core/sock.c
18	fs/file_table.c
17	mm/page_alloc.c
15	lib/iomap.c


Bug of the week
---------------
Not in the top 10 (but barely not so), but upcoming fast is a bug that has a very
distinct pattern.
The backtraces are at http://www.kerneloops.org/searchweek.php?search=fput

The pattern is that the kernel gets an invalid pointer passed to fput(),
coming down from a select() system call done by the "wpa_supplicant" program.
The fact that it is ONLY wpa_supplicant implicates the wireless/network stack.
Another observation is that this only happens with 64 bit kernels, even though
a large portion of the users uses 32 bit kernels. This implies that this is a 64-bit
type of bug. It appears that the top 32 bit of the pointers is getting corrupted
(the bottom part at least looks valid).



Top 10 reported bugs
--------------------

Rank 1: __register_sysctl_paths
	Reported 741 times (1254 total reports)
	Duplicate /proc registration. Bugs in madwifi but also in the parport driver
	This oops was last seen in version 2.6.25.3, and first seen in 2.6.25-rc3.
	More info: http://www.kerneloops.org/searchweek.php?search=__register_sysctl_paths

Rank 2: mark_buffer_dirty
	Reported 110 times (306 total reports)
	EXT3 bug while hot-removing a USB device
	This oops was last seen ...
From: Evgeniy Polyakov
Date: Friday, May 16, 2008 - 10:14 am

Number of them from via-velocity driver should be fixed by attached
patch (added Francois Romieu <romieu@fr.zoreil.com> to copy), but
frankly that looks really bad. Allocations are protected by lock, which
is used for interrupts, but that is safe, since device is turned off,
but also for suspend (which can free them again, btw), mii register dump
(will break without lock) and something else, which should be fine
though because of rtnl. What we could do better, is to allocate new
rings in advance, and only substitue pointers and write registers under
the lock, Francois?

diff --git a/drivers/net/via-velocity.c b/drivers/net/via-velocity.c
index 6b8d882..d6b7972 100644
--- a/drivers/net/via-velocity.c
+++ b/drivers/net/via-velocity.c
@@ -1251,7 +1251,7 @@ static int velocity_init_rd_ring(struct velocity_info *vptr)
 	vptr->rx_buf_sz = (mtu <= ETH_DATA_LEN) ? PKT_BUF_SZ : mtu + 32;
 
 	vptr->rd_info = kcalloc(vptr->options.numrx,
-				sizeof(struct velocity_rd_info), GFP_KERNEL);
+				sizeof(struct velocity_rd_info), GFP_ATOMIC);
 	if (!vptr->rd_info)
 		return -ENOMEM;
 
@@ -1324,7 +1324,7 @@ static int velocity_init_td_ring(struct velocity_info *vptr)
 
 		vptr->td_infos[j] = kcalloc(vptr->options.numtx,
 					    sizeof(struct velocity_td_info),
-					    GFP_KERNEL);
+					    GFP_ATOMIC);
 		if (!vptr->td_infos[j])	{
 			while(--j >= 0)
 				kfree(vptr->td_infos[j]);


-- 
	Evgeniy Polyakov
--

From: Adrian Bunk
Date: Friday, May 16, 2008 - 11:04 am

Unless I misunderstand your webinterface another pattern is a "fc9" in 
the version string.

My first guess would be that it might be a problem in some code that is 
only in Fedora kernels?

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

--

From: Arjan van de Ven
Date: Friday, May 16, 2008 - 11:19 am

that's because fc9 is the only OS that currently ships the client by default,
which means that it's a statistical thing where 90%+ of the reports come from

that may or may not be true, but we can't conclude that right now.
--

From: Dave Jones
Date: Monday, May 19, 2008 - 8:53 pm

On Fri, May 16, 2008 at 09:04:26PM +0300, Adrian Bunk wrote:
 > On Fri, May 16, 2008 at 09:41:31AM -0700, Arjan van de Ven wrote:
 > >...
 > > Bug of the week
 > > ---------------
 > > Not in the top 10 (but barely not so), but upcoming fast is a bug that has a very
 > > distinct pattern.
 > > The backtraces are at http://www.kerneloops.org/searchweek.php?search=fput
 > >
 > > The pattern is that the kernel gets an invalid pointer passed to fput(),
 > > coming down from a select() system call done by the "wpa_supplicant" program.
 > > The fact that it is ONLY wpa_supplicant implicates the wireless/network stack.
 > > Another observation is that this only happens with 64 bit kernels, even though
 > > a large portion of the users uses 32 bit kernels. This implies that this is a 64-bit
 > > type of bug. It appears that the top 32 bit of the pointers is getting corrupted
 > > (the bottom part at least looks valid).
 > >...
 > 
 > Unless I misunderstand your webinterface another pattern is a "fc9" in 
 > the version string.

Unsurprising really given we just did a release, and not many other distros
are enabling kerneloops by default yet.

 > My first guess would be that it might be a problem in some code that is 
 > only in Fedora kernels?

Very likely, though it's worth noting that all the wireless patches we have
in f9 are from wireless.git, so they're valid 2.6.26-rc bugs 

	Dave

-- 
http://www.codemonkey.org.uk
--

From: Frank Ch. Eigler
Date: Friday, May 16, 2008 - 11:50 am

Hi -

Arjan, would it be possible for kerneloops.org to notify us systemtap
people (cc:d) automagically if the oops messages implicate systemtap
modules by including "stap_.*" in the loaded-module list or stack
backtrace symbols?

- FChE
--

From: Arjan van de Ven
Date: Friday, May 16, 2008 - 12:47 pm

I can get you an RSS feed for that....

doing something more proactive is harder due to the unscalable nature
--

From: Frank Ch. Eigler
Date: Friday, May 16, 2008 - 1:23 pm

Hi -


RSS would be fine, and can be converted to email by other tools.

- FChE
--

Previous thread: [patch 20/22] vfs: add path_setattr() by Miklos Szeredi on Friday, May 16, 2008 - 9:31 am. (1 message)

Next thread: [PATCH 1/4] [SPI] [POWERPC] spi_mpc83xx: handles Freescale MPC8610 as well by Anton Vorontsov on Friday, May 16, 2008 - 9:50 am. (3 messages)