Re: 2.6.35-rc2 : OOPS with LTP memcg regression test run.

Previous thread: [PATCHv5 00/16] kill unnecessary SB sync wake-ups + cleanups by Artem Bityutskiy on Sunday, June 6, 2010 - 7:50 am. (21 messages)

Next thread: [PATCH 7/7] Add scripts/coccinelle/deref_null.cocci by Nicolas Palix on Sunday, June 6, 2010 - 8:15 am. (1 message)
From: Sachin Sant
Date: Sunday, June 6, 2010 - 8:06 am

While executing LTP Controller tests(memcg regression) on
a POWER6 box came across this following OOPS.

Memory cgroup out of memory: kill process 9139 (memcg_test_1) score 3 or a child
Killed process 9139 (memcg_test_1) vsz:3456kB, anon-rss:448kB, file-rss:1088kB
Memory cgroup out of memory: kill process 9140 (memcg_test_1) score 3 or a child
Killed process 9140 (memcg_test_1) vsz:3456kB, anon-rss:448kB, file-rss:1088kB
Unable to handle kernel paging request for data at address 0x720072007200720
Faulting instruction address: 0xc00000000015b778
Oops: Kernel access of bad area, sig: 11 [#2]
SMP NR_CPUS=1024 NUMA pSeries
last sysfs file: /sys/devices/system/cpu/cpu1/cache/index1/shared_cpu_map
Modules linked in: quota_v2 quota_tree ipv6 fuse loop dm_mod sr_mod cdrom sg sd_mod crc_t10dif ibmvscsic scsi_transport_srp scsi_tgt scsi_mod
NIP: c00000000015b778 LR: c00000000015b740 CTR: 0000000000000000
REGS: c000000009812ff0 TRAP: 0300   Tainted: G      D      (2.6.35-rc2-autotest)
MSR: 8000000000009032 <EE,ME,IR,DR>  CR: 44004424  XER: 00000001
DAR: 0720072007200720, DSISR: 0000000040000000
TASK = c000000005fb1100[9155] 'umount' THREAD: c000000009810000 CPU: 0
GPR00: 0000000000000000 c000000009813270 c000000000d3d7a0 0000000000000000
GPR04: 0000000000008050 0000000000160000 0000000000000027 c00000000f2c6870
GPR08: 00000000000006a5 c000000000b16870 c000000000cf0140 000000000e7b0000
GPR12: 0000000024004428 c000000007440000 0000000000008000 fffffffffffff000
GPR16: 0000000000000000 c0000000098138f0 000000000000002d 0000000000000027
GPR20: 0000000000000000 0000000000000027 0000000000000000 c000000007063138
GPR24: ffffffffffffffff 0000000000000000 c00000000019bafc c00000000e02e000
GPR28: 0000000000000001 0000000000008050 c000000000ca6b00 0720072007200720
NIP [c00000000015b778] .kmem_cache_alloc+0xb0/0x13c
LR [c00000000015b740] .kmem_cache_alloc+0x78/0x13c
Call Trace:
[c000000009813270] [c00000000015b740] .kmem_cache_alloc+0x78/0x13c (unreliable)
[c000000009813310] [c00000000019bafc] ...
From: Al Viro
Date: Sunday, June 6, 2010 - 8:40 am

That's very odd, since
; git diff --stat 6c5de280b6..v2.6.35-rc2         
 Makefile                             |    2 +-
 drivers/gpu/drm/i915/intel_display.c |    9 +++++++
 fs/ext4/inode.c                      |   40 +++++++++++++++++++--------------
 fs/ext4/move_extent.c                |    3 ++
 4 files changed, 36 insertions(+), 18 deletions(-)
;
and nothing of that looks like good candidates...
--

From: Maciej Rutecki
Date: Thursday, June 10, 2010 - 1:00 pm

I created a Bugzilla entry at 
https://bugzilla.kernel.org/show_bug.cgi?id=16178
for your bug report, please add your address to the CC list in there, thanks!


-- 
Maciej Rutecki
http://www.maciek.unixy.pl
--

From: KAMEZAWA Hiroyuki
Date: Thursday, June 10, 2010 - 6:35 pm

On Thu, 10 Jun 2010 22:00:57 +0200

Hmm... It seems a panic in SLUB or SLAB.
Is .config available ?


--

From: Sachin Sant
Date: Thursday, June 10, 2010 - 10:39 pm

I think the root cause for this problem was same as the one
mentioned in this thread (Bug kmalloc-4096 : Poison overwritten)

http://marc.info/?l=linux-kernel&m=127586004308747&w=2 <http://marc.info/?l=linux-kernel&m=127586004308747&w=2>

I verified that the problem goes away after applying the commit 386f40c.

Thanks
-Sachin 


-- 

---------------------------------
Sachin Sant
IBM Linux Technology Center
India Systems and Technology Labs
Bangalore, India
---------------------------------

--

Previous thread: [PATCHv5 00/16] kill unnecessary SB sync wake-ups + cleanups by Artem Bityutskiy on Sunday, June 6, 2010 - 7:50 am. (21 messages)

Next thread