Re: 2.6.20.3: kernel BUG at mm/slab.c:597 try#2

Previous thread: [PATCH 2 of 2] Make XFS use block_page_mkwrite() by David Chinner on Sunday, March 18, 2007 - 4:31 pm. (1 message)

Next thread: [PATCH/RFC] [JFFS2] Implement block trace features in JFFS2 by Kyungmin Park on Sunday, March 18, 2007 - 5:32 pm. (1 message)
From: Andreas Steinmetz
Date: Sunday, March 18, 2007 - 5:34 pm

As posted to lkml and linux-scsi on 2007-03-15 without reply, see
http://marc.info/?l=linux-kernel&m=117395128412313&w=2 for original post:

It is not so nice when one can write backup tapes but the tapes cannot
be read. I don't know if memory management or the st driver is the
culprit, but this is a not so nice situation.

I can't even say if the tapes are written correctly as I can't read them
(one does not reboot production machines back to 2.4.x just to try to
read a backup tape - I don't have 2.6.x older than 2.6.20 on these
machines).
-- 
Andreas Steinmetz                       SPAMmers use robotrap@domdv.de

-

From: Andrew Morton
Date: Sunday, March 18, 2007 - 11:00 pm

Repeatable oops in our most recently released kernel, nobody bothers to

        BUG_ON(!PageSlab(page));

that's seriously screwed up.  Do you have CONFIG_DEBUG_SLAB enabled?  If
not, please enable it and retest.

-

From: Pekka Enberg
Date: Monday, March 19, 2007 - 1:00 am

This is scary. Looking at disassembly of the OOPS:

Disassembly of section .text:

00000000 <.text>:
   0:   5f                      pop    %edi
   1:   c3                      ret
   2:   57                      push   %edi
   3:   89 c1                   mov    %eax,%ecx
   5:   89 d7                   mov    %edx,%edi
   7:   8d 92 00 00 00 40       lea    0x40000000(%edx),%edx
   d:   56                      push   %esi
   e:   c1 ea 0c                shr    $0xc,%edx
  11:   53                      push   %ebx
  12:   c1 e2 05                shl    $0x5,%edx
  15:   03 15 40 5d 5a c0       add    0xc05a5d40,%edx

At this point, edx has the result of virt_to_page().

  1b:   8b 02                   mov    (%edx),%eax
  1d:   f6 c4 40                test   $0x40,%ah
  20:   74 03                   je     0x25

If it's a compound page, look up the real page from ->private.

  22:   8b 52 0c                mov    0xc(%edx),%edx

Now, reload page flags.

  25:   8b 02                   mov    (%edx),%eax

And test...

  27:   a8 80                   test   $0x80,%al
  29:   75 04                   jne    0x2f
  2b:   0f 0b                   ud2a
  2d:   eb fe                   jmp    0x2d
  2f:   39 4a 18                cmp    %ecx,0x18(%edx)

[snip, snip]

EIP is at kmem_cache_free+0x29/0x5a
eax: c1800000   ebx: f0ae12c0   ecx: c18f73c0   edx: c1800000
esi: c1919de0   edi: 00000000   ebp: 00001000   esp: f1fe7e14
ds: 007b   es: 007b   ss: 0068

But somehow eax and edx have the same value 0xc1800000 here. Hmm?

                                   Pekka
-

From: Pekka Enberg
Date: Monday, March 19, 2007 - 1:32 am

Aah, but if you look at contents of the stack:

Stack: f0ae12c0 c1919de0 ffffffea c0137f97 00000000 f0ae12c0 c1919e20 c0168d45
       f0ae12c0 00001000 c0168fb9 c02a77e3 00001000 00000000 00000000 00000000
       00000000 c17bb6e0 00001000 00000000 f1b38be8 00000003 f54ac050 c1b9d6e8
Call Trace:
 [<c0137f97>] mempool_free+0x48/0x4c
 [<c0168d45>] bio_free+0x21/0x2c
 [<c0168fb9>] bio_put+0x22/0x23

You can see that mempool_free is passing a NULL pointer to
kmem_cache_free() which doesn't handle it properly. The NULL pointer
comes from bio_free() where ->bi_io_vec is  NULL because nr_iovecs
passed to bio_alloc_bioset() was zero.

The question is, why is nr_pages zero in scsi_req_map_sg()?
-

From: Pekka Enberg
Date: Monday, March 19, 2007 - 1:35 am

Note that the following patch I posted only addresses the part where
slab is clearly failing here:

http://lkml.org/lkml/2007/3/19/42

So, while it should fix the oops, there might be a bug lurking in the
SCSI or block layer still.
-

From: Mike Christie
Date: Monday, March 19, 2007 - 10:49 am

Could you try this patch
http://marc.info/?l=linux-scsi&m=116464965414878&w=2
I thought st was modified to not send offsets in the last elements but
it looks like it wasn't.
-

From: James Bottomley
Date: Monday, March 19, 2007 - 11:29 am

Actually, there are two patches in the email referred to.  If the
analysis that we're passing NULL to mempool_free is correct, it should
be the second one that fixes the problem (the one that checks
bio->bi_io_vec before freeing it).  Which would mean we have a
nr_vecs==0 bio generated by the tar somehow.

James


-

From: Mike Christie
Date: Monday, March 19, 2007 - 12:06 pm

I think we might only need the first patch if the problem is similar to
what the lsi guys were seeing. I thought the problem is that we are not
estimating how large the transfer is correctly because we do not take
into account offsets at the end. This results in nr_vecs being zero when
it should be a valid value. I thought Kai's patch:
http://bugzilla.kernel.org/show_bug.cgi?id=7919
http://git.kernel.org/?p=linux/kernel/git/jejb/scsi-misc-2.6.git;a=commitdiff;h=9abe16...
fixed the problem on st's side, but I guess not so you are probably right.

Here is a patch that dumps the sgl we are getting from st so we can see
for sure what we are getting and can decide if we need the first patch,
second patch or both.
From: Mike Christie
Date: Monday, March 19, 2007 - 2:12 pm

Oh, I noticed that the subject for the mail references 2.6.30.3 and the
patch for st in the bugzilla did not make into 2.6.20 and is not in .3.
Could we try the st patch in the bugzilla first?
-

From: Andreas Steinmetz
Date: Monday, March 19, 2007 - 4:25 pm

Ok, the st patch from bugzilla solves the problem (tested on both
affected machines).
-- 
Andreas Steinmetz                       SPAMmers use robotrap@domdv.de
-

From: Andrew Morton
Date: Monday, March 19, 2007 - 4:40 pm

On Tue, 20 Mar 2007 00:25:02 +0100


If you're referring to the below patch then it's already in mainline, and
has been for a month.

Have you tested 2.6.21-rc4?  If not, please do so.

Perhaps we should merge this into 2.6.20.x?



commit 9abe16c670bd3d4ab5519257514f9f291383d104
Author: Kai Makisara <Kai.Makisara@kolumbus.fi>
Date:   Sat Feb 3 13:21:29 2007 +0200

    [SCSI] st: fix Tape dies if wrong block size used, bug 7919
    
    On Thu, 1 Feb 2007, Andrew Morton wrote:
    > On Thu, 1 Feb 2007 15:34:29 -0800
    > bugme-daemon@bugzilla.kernel.org wrote:
    >
    > > http://bugzilla.kernel.org/show_bug.cgi?id=7919
    > >
    > >            Summary: Tape dies if wrong block size used
    > >     Kernel Version: 2.6.20-rc5
    > >             Status: NEW
    > >           Severity: normal
    > >              Owner: scsi_drivers-other@kernel-bugs.osdl.org
    > >          Submitter: dmartin@sccd.ctc.edu
    > >
    > >
    > > Most recent kernel where this bug did *NOT* occur: 2.6.17.14
    > >
    > > Other Kernels Tested and Results:
    > >
    > >     OK 2.6.15.7
    > >     OK 2.6.16.37
    > >     OK 2.6.17.14
    > >     BAD 2.6.18.6
    > >     BAD 2.6.18-1.2869.fc6
    > >     BAD 2.6.19.2 +
    > >     BAD 2.6.20-rc5
    > >
    > > NOTE: 2.6.18-1.2869.fc6 is a Fedora modified kernel, all others are from kernel.org
    > >
    ...
    > > Steps to reproduce:
    > > Get a Adaptec AHA-2940U/UW/D / AIC-7881U card and a tape drive,
    > > install a recent kernel
    > > set the tape block size - mt setblk 4096
    > > read from or write to tape using wrong block size - tar -b 7 -cvf /dev/tape foo
    > >
    Write does not trigger this bug because the driver refuses in fixed block
    mode writes that are not a multiple of the block size. Read does trigger
    it in my system.
    
    The bug is not associated with any specific HBA. st tries to do direct i/o
    in fixed block mode with reads that are not a multiple of tape block ...
From: Andreas Steinmetz
Date: Monday, March 19, 2007 - 4:46 pm

Sorry, this is not possible on these machines. They are production
servers and every problem on them that cannot be easily solved via



-- 
Andreas Steinmetz                       SPAMmers use robotrap@domdv.de
-

From: Andreas Steinmetz
Date: Monday, March 19, 2007 - 4:25 pm

Here's the patch output:

sg length 6 offset 0
sg length 12 offset 0
sg length 4096 offset 0
sg length 4096 offset 0
sg length 2048 offset 0

Please note (as replied in the other mail) that the bugzilla patch


-- 
Andreas Steinmetz                       SPAMmers use robotrap@domdv.de
-

From: Gene Heskett
Date: Monday, March 19, 2007 - 2:47 pm

James, could this also be the cause of a tar based backup going crazy and 
thinking all data is new under any 2.6.21-rc* kernel I've tested so far 
with amanda, which in my case uses tar?  I've tried the fedora patched 
tar-1.15-1, and one I hand built right after 1.15-1 came out over a year 
ago, and they both do it, but only when booted to a 2.6.21-rc* kernel.

This obviously will be a show-stopper, either for amanda (and by 
inference, any app that uses tar), or for the migration of an amanda 



-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
fractal radiation jamming the backbone
-

From: James Bottomley
Date: Monday, March 19, 2007 - 3:06 pm

Er, I don't think so .. that sounds like mtime miscompare, which is
either a problem with the filesystem or a problem with the way mtime is
stored in the tar archive.

James


-

From: Gene Heskett
Date: Monday, March 19, 2007 - 4:29 pm

Well, since the times reported by ls -l --full-time are sane
[root@coyote pix]# ls -l --full-time
total 924784
-rw-r--r-- 1 root   root     985324 2002-06-09 18:14:54.000000000 -0400 
0203.jpg
[... the rest of a 100k listing, booted to 2.6.20.3-rdsl-0.31]

And:
[root@coyote pix]# ls -l --full-time
total 924784
-rw-r--r-- 1 root   root     985324 2002-06-09 18:14:54.000000000 -0400 
0203.jpg

booted to 2.6.21-rc4

allthough the fractional second is a string of .000000000, even when 
booted to a tar-unfriendly kernel, then it would tend to point at tar, 
but two differently built versions of tar have been confirmed as 
miss-behaving in the presence of a kernel in the 2.6.21 series so far, 
all of them.

I'm going to reboot twice more tonight, once to verify that the output of 
an ls -l --full-time is as I said above, I'll save this and do that again 
and clip it in after a reboot to 2.6.21-rc4, and once to 2.6.20.4-rc1 to 
see if by chance one of those patches is the guilty party.  I'll leave 
the latter running tonight for the amanda run & see what falls out.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Try to relax and enjoy the crisis.
		-- Ashleigh Brilliant
-

Previous thread: [PATCH 2 of 2] Make XFS use block_page_mkwrite() by David Chinner on Sunday, March 18, 2007 - 4:31 pm. (1 message)

Next thread: [PATCH/RFC] [JFFS2] Implement block trace features in JFFS2 by Kyungmin Park on Sunday, March 18, 2007 - 5:32 pm. (1 message)