Re: 2.6.21-rc4-mm1

Previous thread: ignore this posting by David Miller on Monday, March 19, 2007 - 7:39 pm. (1 message)

Next thread: Re: XFS internal error xfs_da_do_buf(2) at line 2087 of file fs/xfs/xfs_da_btree.c. Caller 0xc01b00bd by David Chinner on Monday, March 19, 2007 - 11:46 pm. (1 message)
From: Andrew Morton
Date: Monday, March 19, 2007 - 9:56 pm

Temporarily at

  http://userweb.kernel.org/~akpm/2.6.21-rc4-mm1/

Will appear later at

  ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc4/2.6.21-rc4-mm1/



- Restored the RSDL CPU scheduler (a new version thereof)



Boilerplate:

- See the `hot-fixes' directory for any important updates to this patchset.

- To fetch an -mm tree using git, use (for example)

  git-fetch git://git.kernel.org/pub/scm/linux/kernel/git/smurf/linux-trees.git tag v2.6.16-rc2-mm1
  git-checkout -b local-v2.6.16-rc2-mm1 v2.6.16-rc2-mm1

- -mm kernel commit activity can be reviewed by subscribing to the
  mm-commits mailing list.

        echo "subscribe mm-commits" | mail majordomo@vger.kernel.org

- If you hit a bug in -mm and it is not obvious which patch caused it, it is
  most valuable if you can perform a bisection search to identify which patch
  introduced the bug.  Instructions for this process are at

        http://www.zip.com.au/~akpm/linux/patches/stuff/bisecting-mm-trees.txt

  But beware that this process takes some time (around ten rebuilds and
  reboots), so consider reporting the bug first and if we cannot immediately
  identify the faulty patch, then perform the bisection search.

- When reporting bugs, please try to Cc: the relevant maintainer and mailing
  list on any email.

- When reporting bugs in this kernel via email, please also rewrite the
  email Subject: in some manner to reflect the nature of the bug.  Some
  developers filter by Subject: when looking for messages to read.

- Occasional snapshots of the -mm lineup are uploaded to
  ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/mm/ and are announced on
  the mm-commits list.


Changes since 2.6.21-rc3-mm1:


 origin.patch
 git-acpi.patch
 git-alsa.patch
 git-arm-master.patch
 git-arm.patch
 git-avr32.patch
 git-cifs.patch
 git-cpufreq.patch
 git-powerpc.patch
 git-drm.patch
 git-dvb.patch
 git-gfs2-nmw.patch
 git-hid.patch
 git-ia64.patch
 ...
From: Michal Piotrowski
Date: Tuesday, March 20, 2007 - 12:54 am

Some new details about
http://www.ussg.iu.edu/hypermail/linux/kernel/0703.2/1367.html

I can reproduce it by running this on AutoTest

for profiler in ('oprofile', ):
	try:
		print "Testing profiler %s ..." % profiler
		job.profilers.add(profiler)
		job.run_test('aiostress',)
		job.profilers.delete(profiler)
	except:
		print "Test of profiler %s failed" % profiler
		raise

I guess that oprofile triggers it.

BUG: using smp_processor_id() in preemptible [00000001] code: mount/4934
caller is avail_to_resrv_perfctr_nmi_bit+0x2b/0x43
 [<c0105256>] show_trace_log_lvl+0x1a/0x2f
 [<c010597b>] show_trace+0x12/0x14
 [<c0105a3d>] dump_stack+0x16/0x18
 [<c0212f43>] debug_smp_processor_id+0xb3/0xc8
 [<c0116a26>] avail_to_resrv_perfctr_nmi_bit+0x2b/0x43
 [<fdc829b9>] nmi_create_files+0x2a/0x10e [oprofile]
 [<fdc81f52>] oprofile_create_files+0xe6/0xec [oprofile]
 [<fdc82157>] oprofilefs_fill_super+0x78/0x7e [oprofile]
 [<c018296e>] get_sb_single+0x59/0x9f
 [<fdc8208f>] oprofilefs_get_sb+0x1c/0x1e [oprofile]
 [<c01823d2>] vfs_kern_mount+0x81/0xf1
 [<c0182492>] do_kern_mount+0x38/0xde
 [<c01962b1>] do_mount+0x605/0x693
 [<c01963bf>] sys_mount+0x80/0xb5
 [<c0104270>] syscall_call+0x7/0xb
 =======================

l *avail_to_resrv_perfctr_nmi_bit+0x2b/0x43
0xc01169fb is in avail_to_resrv_perfctr_nmi_bit (/mnt/md0/devel/linux-mm/arch/i386/kernel/nmi.c:124).
119             return 0;
120     }
121
122     /* checks for a bit availability (hack for oprofile) */
123     int avail_to_resrv_perfctr_nmi_bit(unsigned int counter)
124     {
125             BUG_ON(counter > NMI_MAX_COUNTER_BITS);
126
127             return (!test_bit(counter, &__get_cpu_var(perfctr_nmi_owner)));
128     }


BUG: using smp_processor_id() in preemptible [00000001] code: mount/4934
caller is avail_to_resrv_perfctr_nmi_bit+0x2b/0x43
 [<c0105256>] show_trace_log_lvl+0x1a/0x2f
 [<c010597b>] show_trace+0x12/0x14
 [<c0105a3d>] dump_stack+0x16/0x18
 [<c0212f43>] debug_smp_processor_id+0xb3/0xc8
 ...
From: Andy Whitcroft
Date: Tuesday, March 20, 2007 - 2:45 am

[All of the below is from the pre hot-fix runs.  The very few results
which are in for the hot-fix runs seem worse if anything.  :(  All

Unsure if the above is the culprit but there seems to be a smattering of
BUG's in kernbench from the schedular on several systems, and panics
which do not fully dump out.

elm3b239 is about 2/4 kernbench being the test in progress when we
------------[ cut here ]------------
kernel BUG at kernel/sched.c:3505!
invalid opcode: 0000 [1] SMP
last sysfs file: devices/pci0000:00/0000:00:00.0/irq
CPU 19
Modules linked in: loop dm_mod md_mod sg
Pid: 59, comm: migration/19 Not tainted 2.6.21-rc4-mm1-autokern1 #1
RIP: 0010:[<ffffffff804924f6>]  [<ffffffff804924f6>]
__sched_text_start+0x3a6/0x882
RSP: 0018:ffff810100cefe20  EFLAGS: 00010002
RAX: 000000000000008c RBX: ffff81002b0f64d8 RCX: 000000000000000c
RDX: 0000000000000000 RSI: 000000000000008c RDI: ffff81002b0f6da8
RBP: ffff810100cefeb0 R08: 000000000000008c R09: ffff81002b0f6d98
R10: 0000000000000034 R11: ffffffff8021ab20 R12: ffff81002b0f5a40
R13: 0000000000000002 R14: 000000725eb99ef7 R15: 0000000000000013
FS:  0000000000000000(0000) GS:ffff810100c42bc0(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00002ba9c431ab70 CR3: 00000001060fc000 CR4: 00000000000006e0
Process migration/19 (pid: 59, threadinfo ffff810100cee000, task
ffff810100ced8e0)
Stack:  0000000000000001 0000000000000001 ffff81010b681e98 ffff810100ced8e0
 ffff810100cefe80 ffff810100ceda78 0000000300000000 ffff81010b681e88
 ffff81010b681e90 0000000000000286 0000000000000013 0000000000000000
Call Trace:
 [<ffffffff80224a00>] migration_thread+0x1b0/0x250
 [<ffffffff80224850>] migration_thread+0x0/0x250
 [<ffffffff8023c85b>] kthread+0xdb/0x120
 [<ffffffff8020a7a8>] child_rip+0xa/0x12
 [<ffffffff8023c780>] kthread+0x0/0x120
 [<ffffffff8020a79e>] child_rip+0x0/0x12


Code: 0f 0b eb fe 49 8b 94 24 b8 01 00 00 49 8b 84 24 b0 01 00 00
RIP  [<ffffffff804924f6>] __sched_text_start+0x3a6/0x882
 RSP ...
From: Andy Whitcroft
Date: Thursday, March 22, 2007 - 1:41 am

Well I have one result through for backing RSDL out on elm3b239 and that
does indeed seem to give us a successful boot and test.  peterz has
pointed me to an incremental patch from Con which I'll push through
testing and see if that sorts it out.

-apw
-

From: Andy Whitcroft
Date: Thursday, March 22, 2007 - 2:48 am

Ok, tested the patch below on top of 2.6.21-rc4-mm1 and this seems to
fix the problem:

http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc4-mm1-rsdl-0.32.patch

Hard to tell from that patch whether it will be fixed in the changes
already committed to the next -mm.

Its possible that it may be fixed by the following patch:

    sched-rsdl-improvements.patch

Which has the following slipped in at the end of the changelog:

    A tiny change checking for MAX_PRIO in normal_prio()
    may prevent oopses on bootup on large SMP due to
    forking off the idle task.

Con, are all the changes in the 0.32 patch above with akpm?

-apw
-

From: Con Kolivas
Date: Thursday, March 22, 2007 - 3:04 am

Yes he's queued everything in that patch you tested for the next -mm. Thanks 
very much for testing it.

-- 
-ck
-

From: Andy Whitcroft
Date: Thursday, March 22, 2007 - 10:07 am

No worries.  I've just got through the results on the other machine in
the mix.  That machine seems to be fixed by backing out RSDL and not by
the fixup 0.32 patch ...

This second machine seems to had hard very soon after user space starts
executing but without a panic.  I can't say that the symptoms are very
definitive, but I do have a good result from that machine without RSDL
and not with rsdl-0.32.

The machine is a dual-core x86_64 machine: Dual Core AMD Opteron(tm)
Processor 275.

I'll let you know if I find out anything else.  Shout if you want any
information or have anything you want poked or tested.

-apw


-

From: Andy Whitcroft
Date: Thursday, March 22, 2007 - 11:17 am

Ok, I have yet a third x86_64 machine is is blowing up with the latest
2.6.21-rc4-mm1+hotfixes+rsdl-0.32 but working with
2.6.21-rc4-mm1+hotfixes-RSDL.  I have results on various hotfix levels
so I have just fired off a set of tests across the affected machines on
that latest hotfix stack plus the RSDL backout and the results should be
in in the next hour or two.

I think there is a strong correlation between RSDL and these hangs.  Any
suggestions as to the next step.

-apw
-

From: Con Kolivas
Date: Thursday, March 22, 2007 - 3:14 pm

If it's hitting the bug_on that I put in sched.c which you say it is then it 
is most certainly my fault. It implies a task has been queued without a 
corresponding bit being anywhere in the priority bitmaps. Somehow you only 
seem to be hitting it on big(ger) smp which is why I haven't seen it. It 
implies some complication occuring at sched or idle init/fork off these 
accounting not working. If I could reproduce it on qemu I'd step through the 
kernel init checking where each task is being queued and see if the bitmaps 
are being set. This is obviously time consuming and laborious so I don't 
expect you to do it. 

The next best thing is if you can send me the config of one of the machines 
that's oopsing I can try that on qemu but qemu is only good at debugging 
i386. If any of the machines that were oopsing were i386 that would be very 
helpful, otherwise x86_64 is the next best. Then I need to make a creative 
debugging patch for you to try which checks every queued/dequeued task and 
dumps all that information. I don't have that patch just yet so I need to 
find enough accumulated short stints at the pc to do that (still hurts a lot 
and worsens my condition).

Thanks!

-- 
-ck
-

From: Con Kolivas
Date: Thursday, March 22, 2007 - 11:18 pm

Found a nasty in requeue_task
+	if (list_empty(old_array->queue + old_prio))
+		__clear_bit(old_prio, p->array->prio_bitmap);

see anything wrong there? I do :P

I'll queue that up with the other changes pending and hopefully that will fix 
your bug.

-- 
-ck
-

From: Andy Whitcroft
Date: Friday, March 23, 2007 - 1:45 am

Tests queued with your rdsl-0.33 patch (I am assuming its in there).
Will let you know how it looks.

-apw

-

From: Andy Whitcroft
Date: Friday, March 23, 2007 - 5:28 am

Hmmm, this is good for the original machine (as was 0.32) but not for
either of the other two.  I am seeing panics as below on those two.

-apw

elm3b245:

NULL pointer dereference
 at 0000000000000020 RIP:
 [<ffffffff80497d94>] __sched_text_start+0x424/0x8a5
PGD 0
Oops: 0000 [1] SMP
last sysfs file: block/ram0/uevent
CPU 0
Modules linked in:
Pid: 1038, comm: udevd Not tainted 2.6.21-rc4-mm1-autokern1 #1
RIP: 0010:[<ffffffff80497d94>]  [<ffffffff80497d94>]
__sched_text_start+0x424/0x8a5
RSP: 0018:ffff81000316de68  EFLAGS: 00010017
RAX: 00000000000006c6 RBX: 0000000000000001 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 000000000000008c RDI: ffffffffffffffd0
RBP: ffff81000316def8 R08: 0000000000000064 R09: 0000000000000024
R10: ffff810001014ad8 R11: 0000000000000286 R12: ffff810001014218
R13: ffff810001013780 R14: ffff810001769450 R15: 0000000000000000
FS:  00002b75d89c66d0(0000) GS:ffffffff805aa000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000020 CR3: 0000000000201000 CR4: 00000000000006e0
Process udevd (pid: 1038, threadinfo ffff81000316c000, task
ffff8100031cebb0)
Stack:  0000000000000000 0000000000000001 0000000000000000 ffff8100031cebb0
 ffffffffffffffd0 00000036e28ef568 ffff8100031ced48 0000000000000292
 ffff81000316def8 0000000000000246 ffff81000316def8 ffffffff8022af3d
Call Trace:
 [<ffffffff8022af3d>] put_files_struct+0xbd/0xc9
 [<ffffffff8022c773>] do_exit+0x7d2/0x7d6
 [<ffffffff8022c801>] sys_exit_group+0x0/0x14
 [<ffffffff8022c813>] sys_exit_group+0x12/0x14
 [<ffffffff8020968e>] system_call+0x7e/0x83


Code: 48 39 47 50 74 51 48 c7 47 40 00 00 00 00 8b 52 f4 48 b9 40
RIP  [<ffffffff80497d94>] __sched_text_start+0x424/0x8a5
 RSP <ffff81000316de68>
CR2: 0000000000000020
Fixing recursive fault but reboot is needed!


elm3b6:
Unable to handle kernel paging request at 000000000000fb6c RIP:
 [<ffffffff8020c573>] convert_rip_to_linear+0x53/0x91
PGD 180780067 PUD 182242067 PMD 0
Oops: 0000 [1] ...
From: Con Kolivas
Date: Friday, March 23, 2007 - 2:45 pm

This machine seems most sensitive to it (first column):
elm3b6
amd64
newisys
4cpu
config: amd64

Can you throw this debugging patch at it please? The console output might be 
very helpful. On top of sched-rsdl-0.33 thanks!

---
 kernel/sched.c |   39 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 39 insertions(+)

Index: linux-2.6.21-rc4-mm1/kernel/sched.c
===================================================================
--- linux-2.6.21-rc4-mm1.orig/kernel/sched.c	2007-03-24 08:32:19.000000000 +1100
+++ linux-2.6.21-rc4-mm1/kernel/sched.c	2007-03-24 08:42:04.000000000 +1100
@@ -659,6 +659,25 @@ static inline void set_task_entitlement(
 	p->time_slice = p->quota;
 }
 
+static int debug_rqbitmap(struct rq *rq)
+{
+	struct list_head *queue;
+	int idx = 0, error = 0;
+	struct prio_array *array = rq->active;
+
+	for (idx = 0; idx < MAX_PRIO; idx++) {
+		queue = array->queue + idx;
+		if (!list_empty(queue)) {
+			if (!test_bit(idx, rq->dyn_bitmap)) {
+				__set_bit(idx, rq->dyn_bitmap);
+				error = 1;
+				printk(KERN_ERR "MISSING DYNAMIC BIT %d\n", idx);
+			}
+		}
+	}
+	return error;
+}
+
 /*
  * There is no specific hard accounting. The dynamic bits can have
  * false positives. rt_tasks can only be on the active queue.
@@ -679,6 +698,7 @@ static void dequeue_task(struct task_str
 	list_del_init(&p->run_list);
 	if (list_empty(p->array->queue + p->prio))
 		__clear_bit(p->prio, p->array->prio_bitmap);
+	WARN_ON(debug_rqbitmap(rq));
 }
 
 /*
@@ -797,12 +817,14 @@ static void enqueue_task(struct task_str
 {
 	__enqueue_task(p, rq);
 	list_add_tail(&p->run_list, p->array->queue + p->prio);
+	WARN_ON(debug_rqbitmap(rq));
 }
 
 static inline void enqueue_task_head(struct task_struct *p, struct rq *rq)
 {
 	__enqueue_task(p, rq);
 	list_add(&p->run_list, p->array->queue + p->prio);
+	WARN_ON(debug_rqbitmap(rq));
 }
 
 /*
@@ -820,6 +842,7 @@ static void requeue_task(struct task_str
 			__clear_bit(old_prio, old_array->prio_bitmap);
 ...
From: Con Kolivas
Date: Friday, March 23, 2007 - 4:26 pm

Better yet this one which checks the expired array as well and after 
pull_task.

If anyone's getting a bug they think might be due to rsdl please try this (on 
rsdl 0.33).

---
 kernel/sched.c |   51 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 51 insertions(+)

Index: linux-2.6.21-rc4-mm1/kernel/sched.c
===================================================================
--- linux-2.6.21-rc4-mm1.orig/kernel/sched.c	2007-03-24 08:32:19.000000000 +1100
+++ linux-2.6.21-rc4-mm1/kernel/sched.c	2007-03-24 10:22:59.000000000 +1100
@@ -659,6 +659,35 @@ static inline void set_task_entitlement(
 	p->time_slice = p->quota;
 }
 
+static int debug_rqbitmap(struct rq *rq)
+{
+	struct list_head *queue;
+	int idx = 0, error = 0;
+	struct prio_array *array;
+
+	for (idx = 0; idx < MAX_PRIO; idx++) {
+		array = rq->active;
+		queue = array->queue + idx;
+		if (!list_empty(queue)) {
+			if (!test_bit(idx, rq->dyn_bitmap)) {
+				__set_bit(idx, rq->dyn_bitmap);
+				error = 1;
+				printk(KERN_ERR "MISSING DYNAMIC BIT %d\n", idx);
+			}
+		}
+		array = rq->expired;
+		queue = array->queue + idx;
+		if (!list_empty(queue)) {
+			if (!test_bit(idx, rq->exp_bitmap)) {
+				__set_bit(idx, rq->exp_bitmap);
+				error = 1;
+				printk(KERN_ERR "MISSING EXPIRED BIT %d\n", idx);
+			}
+		}
+	}
+	return error;
+}
+
 /*
  * There is no specific hard accounting. The dynamic bits can have
  * false positives. rt_tasks can only be on the active queue.
@@ -679,6 +708,7 @@ static void dequeue_task(struct task_str
 	list_del_init(&p->run_list);
 	if (list_empty(p->array->queue + p->prio))
 		__clear_bit(p->prio, p->array->prio_bitmap);
+	WARN_ON(debug_rqbitmap(rq));
 }
 
 /*
@@ -797,12 +827,14 @@ static void enqueue_task(struct task_str
 {
 	__enqueue_task(p, rq);
 	list_add_tail(&p->run_list, p->array->queue + p->prio);
+	WARN_ON(debug_rqbitmap(rq));
 }
 
 static inline void enqueue_task_head(struct task_struct *p, struct rq *rq)
 {
 ...
From: Andy Whitcroft
Date: Sunday, March 25, 2007 - 5:27 am

Ok, new round of tests across the sensitive machines with 0.33 plus the
above debug patch are in the queue.  Will let you know how they pan out.

The tests with -rc4 + 0.33 are also in.  Failing there also.  Both out
of __sched_text_start, so I'd guess the same cause and the schedular is
fingered.

-apw
-

From: Torsten Kaiser
Date: Sunday, March 25, 2007 - 11:28 am

2.6.21-rc4-mm1 also fails for me.

I tried pure 2.6.21-rc4-mm1, +hotfixes, +hotfixes+rsdl33 and at last
also added above debug patch.

The oops from with the debug-patch added:
[   65.426126] Freeing unused kernel memory: 312k freed
(on the console the system is starting up, getting until "Letting udev
process events ...")
[   66.665611] Unable to handle kernel NULL pointer dereference at
0000000000000020 RIP:
[   66.682030]  [<ffffffff8026167c>] __sched_text_start+0x4dc/0xa0e
[   66.707402] PGD 0
[   66.713473] Oops: 0000 [1] SMP
[   66.722968] last sysfs file:
devices/pci0000:00/0000:00:05.0/host2/target2:0:0/2:0:0:0/type
[   66.747954] CPU 0
[   66.754025] Modules linked in:
[   66.763209] Pid: 1200, comm: udevd Not tainted 2.6.21-rc4-mm1 #4
[   66.781162] RIP: 0010:[<ffffffff8026167c>]  [<ffffffff8026167c>]
__sched_text_start+0x4dc/0xa0e
[   66.807236] RSP: 0018:ffff81007d38fe78  EFLAGS: 00010082
[   66.823115] RAX: ffffffffffffffd0 RBX: 000000000000008c RCX: 000000000000058e
[   66.844439] RDX: 0000000000000000 RSI: 000000000000000c RDI: 0000000000000000
[   66.865767] RBP: ffff81007d38ff08 R08: 0000000000000064 R09: ffff810001014a58
[   66.887092] R10: 000000000000001c R11: 0000000000000246 R12: ffff810001013700
[   66.908418] R13: ffff810001014198 R14: 0000000000000001 R15: 0000000f859461fc
[   66.929745] FS:  00002b67df90e6d0(0000) GS:ffffffff807aa000(0000)
knlGS:0000000000000000
[   66.953950] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[   66.971126] CR2: 0000000000000020 CR3: 0000000000201000 CR4: 00000000000006e0
[   66.992451] Process udevd (pid: 1200, threadinfo ffff81007d38e000,
task ffff81007e354100)
[   67.016915] Stack:  00000000000004b0 0000000000000000
0000000000000000 ffff81007e354100
[   67.041097]  ffffffffffffffd0 ffff81007e354298 ffff81011d420680
ffffffff802234b1
[   67.063407]  0000000000000001 0000000000000000 0000000000000000
0000000000000246
[   67.085149] Call Trace:
[   67.093037]  [<ffffffff802234b1>] filp_close+0x71/0x90
[   ...
From: Andrew Morton
Date: Sunday, March 25, 2007 - 3:01 pm

We've seen multiple reports of this.


Ah, that helps, thanks.

-

From: Con Kolivas
Date: Sunday, March 25, 2007 - 3:49 pm

The debug patch didn't do anything. This means it is not an unset bitmap 

	next = list_entry(queue->next, struct task_struct, run_list);

Urgh. Dereferencing there? That can only be next that's deferencing meaning 
the run_list entry is bogus. That should only ever be done under runqueue 

Thanks!

-- 
-ck
-

From: Con Kolivas
Date: Sunday, March 25, 2007 - 3:59 pm

This is about the only place I can see the run_list is looked at unlocked. Can
you see if this simple patch helps? The debug patch is unnecessary now.

Thanks!

--
Ensure checking task_queued() is only done under runqueue lock.

Signed-off-by: Con Kolivas <kernel@kolivas.org>

---
 kernel/sched.c |   10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

Index: linux-2.6.21-rc4-mm1/kernel/sched.c
===================================================================
--- linux-2.6.21-rc4-mm1.orig/kernel/sched.c	2007-03-26 08:54:15.000000000 +1000
+++ linux-2.6.21-rc4-mm1/kernel/sched.c	2007-03-26 08:55:21.000000000 +1000
@@ -3421,16 +3421,16 @@ static inline void rotate_runqueue_prior
 
 static void task_running_tick(struct rq *rq, struct task_struct *p, int tick)
 {
-	if (unlikely(!task_queued(p))) {
-		/* Task has expired but was not scheduled yet */
-		set_tsk_need_resched(p);
-		return;
-	}
 	/* SCHED_FIFO tasks never run out of timeslice. */
 	if (unlikely(p->policy == SCHED_FIFO))
 		return;
 
 	spin_lock(&rq->lock);
+	if (unlikely(!task_queued(p))) {
+		/* Task has expired but was not scheduled off yet */
+		set_tsk_need_resched(p);
+		goto out_unlock;
+	}
 	/*
 	 * Accounting is performed by both the task and the runqueue. This
 	 * allows frequently sleeping tasks to get their proper quota of


-- 
-ck
-

From: Andy Whitcroft
Date: Monday, March 26, 2007 - 12:49 am

Tests queued with this patch.  Will let you know.

-apw
-

From: Andy Whitcroft
Date: Monday, March 26, 2007 - 8:28 am

That patch had no effect on the problem.

...

Since then we have performed some more debugging on the issue and it
appears that the first stanza in next_dynamic_task is tripping,
triggering a "major_priority_rotation" and the resulting runq bitmap
indicating there is nothing to run.  Discussions with Con seem to
indicate that this is not possible :/.

Subsequent to that Con suggested testing a refactored RSDL patch.  That
patch seemed to work on the machine at hand, so tests have been
submitted for all the affected machines.

http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc4-mm1-rsdl-0.34-test.patch

...

Ok, the preliminary results are in and we seem to have good boots in the
three machines I was hitting early boot oops.  So I think we can say
that the new stack is a lot better than the old.


-apw
-

From: Con Kolivas
Date: Monday, March 26, 2007 - 9:12 am

Well thank you very much indeed. I'm pleased that the code I decided to rip 
out of the next update also took whatever bug was there with it. Fortunately 
it also is not dependant on the buggy sched: accurate user accounting patch 
that I gave up on so here is an incremental from the current -mm queue to 
this code without the "accurate user accounting patch" component for anyone 
who's trying to track just what I'm planning on moving forward with.

http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc4-mm1/sched-rsdl-sd-0.35-tes...

Summary:
 3 files changed, 86 insertions(+), 249 deletions(-)

It also makes lists-add_list_splice_tail.patch unnecessary

-- 
-ck
-

From: Stephane Jourdois
Date: Tuesday, March 20, 2007 - 3:52 am

Hi,

I needed the following patch to fix this compile error (which does not
happend at first compile):

kwisatz@ambre:/usr/src/linux-2.6.21-rc4-mm1 $ rm init/missing_syscalls.h 
kwisatz@ambre:/usr/src/linux-2.6.21-rc4-mm1 $ make init
  CHK     include/linux/version.h
  CHK     include/linux/utsrelease.h
  CHK     include/linux/compile.h
  GEN     init/missing_syscalls.h
  CC      init/missing_syscalls.o
  LD      init/built-in.o
kwisatz@ambre:/usr/src/linux-2.6.21-rc4-mm1 $ cat init/.missing_syscalls.h.cmd
cmd_init/missing_syscalls.h := sed -n '/^\#define/s/[^_]*__NR_\([^[:space:]]*\).*/ \#if !defined (__NR_) \&\& !defined (__IGNORE_)
 \#warning syscall  not implemented
 \#endif/p' /usr/src/linux-2.6.21-rc4-mm1/include/asm-i386/unistd.h >init/missing_syscalls.h

# (note all three \1 missing, replaced by char '^A', not visible here.
# note also that my /bin/sh is symlinked to dash (not bash) 0.5.3

kwisatz@ambre:/usr/src/linux-2.6.21-rc4-mm1 $ make init
  CHK     include/linux/version.h
  CHK     include/linux/utsrelease.h
init/.missing_syscalls.h.cmd:2: *** séparateur manquant . Arrêt.
make: *** [init] Erreur 2


As far as I understand it, Makefile rule cmd_missing_syscalls (from
init/Makefile) is used twice in two different ways:
- At first compile:
  - run the command directly from Makefile,
  - dump this command to init/.missing_syscalls.h.cmd for further use;
- At every but first compile:
  - run existing init/.missing_syscalls.h.cmd


Can someone confirm that this is the right way to patch this ?



Thanks,
- Stéphane.

# complain-about-missing-system-calls-fix.patch
# Make generation of init/missing_syscalls.h more robust.
# Note: This fix is required only for "all but first" compilations, and
# perhaps only on some configurations (cf. /bin/sh).

Signed-off-by: Stéphane (kwisatz) Jourdois <kwisatz@rubis.org>

diff -uNr linux-2.6.21-rc4-mm1.orig/init/Makefile linux-2.6.21-rc4-mm1/init/Makefile
--- linux-2.6.21-rc4-mm1.orig/init/Makefile	2007-03-20 ...
From: Jiri Slaby
Date: Tuesday, March 20, 2007 - 7:31 am

I'm getting this while trying to swsusp:
Stopping tasks ...
Stopping kernel threads timed out after 20 seconds (1 tasks refusing to freeze):
 swapper
 Restarting tasks ... done.

What to test? Enable PM_DEBUG?

regards,
-- 
http://www.fi.muni.cz/~xslaby/            Jiri Slaby
faculty of informatics, masaryk university, brno, cz
e-mail: jirislaby gmail com, gpg pubkey fingerprint:
B674 9967 0407 CE62 ACC8  22A0 32CC 55C3 39D4 7A7E
-

From: Andrew Morton
Date: Tuesday, March 20, 2007 - 9:09 am

hm, OK.  Rafael has been working on fixing the process freezer and it'll
take some time to get it to where we want it to be, I expect.

Rafael, I think that we could afford to add heaps of debug in there at this
stage to help us track down problems like this.

Also, it might be useful to add a temporary /proc/freeze-unfreeze thing
which will simply do a freeze/unfreeze cycle.  Then we can apply various
workloads to the machine while madly stressing the freezer code.  
-

From: Pavel Machek
Date: Tuesday, March 20, 2007 - 11:38 am

echo testproc > /sys/power/disk; echo disk > /sys/power/state ... is
pretty much what you want.
								Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-

From: Jiri Slaby
Date: Tuesday, March 20, 2007 - 12:40 pm

Yes, at least it happened 3 times consecutively, when I tried to asleep the

Ok, I'll try this.

thanks,
-- 
http://www.fi.muni.cz/~xslaby/            Jiri Slaby
faculty of informatics, masaryk university, brno, cz
e-mail: jirislaby gmail com, gpg pubkey fingerprint:
B674 9967 0407 CE62 ACC8  22A0 32CC 55C3 39D4 7A7E
-

From: Pavel Machek
Date: Tuesday, March 20, 2007 - 12:56 pm

It will not help you -- probably -- it is equivalent to just running
s2ram. But it should make "successful" testing easier, because you no
longer need machine with working suspend to test refrigerator.
								Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-

From: Jiri Slaby
Date: Tuesday, March 20, 2007 - 1:13 pm

Aha, I didn't read it carefully. Suspend is working, but not in this kernel.
I haven't tried s2ram in this version. Should I (I'm away from it) -- would
it show something?

regards,
-- 
http://www.fi.muni.cz/~xslaby/            Jiri Slaby
faculty of informatics, masaryk university, brno, cz
e-mail: jirislaby gmail com, gpg pubkey fingerprint:
B674 9967 0407 CE62 ACC8  22A0 32CC 55C3 39D4 7A7E
-

From: Pavel Machek
Date: Tuesday, March 20, 2007 - 1:21 pm

No, probably not. git bisect would help, but I guess it is easier to
let Rafael sort it out.
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-

From: Rafael J. Wysocki
Date: Tuesday, March 20, 2007 - 1:58 pm

Actually, the problem is 100% reproducible on my system too and I doubt it's
caused by the recent freezer patches.

Investigating.

Rafael
-

From: Jiri Slaby
Date: Tuesday, March 20, 2007 - 1:58 pm

I don't know what exactly do you mean by recent, but 2.6.21-rc3-mm2 works
for me.

regards,
-- 
http://www.fi.muni.cz/~xslaby/            Jiri Slaby
faculty of informatics, masaryk university, brno, cz
e-mail: jirislaby gmail com, gpg pubkey fingerprint:
B674 9967 0407 CE62 ACC8  22A0 32CC 55C3 39D4 7A7E
-

From: Rafael J. Wysocki
Date: Tuesday, March 20, 2007 - 2:06 pm

Thanks for the confirmation.

The patches I was talking about had already been in 2.6.21-rc3-mm2, so the
reason of this failure must be different.

Greetings,
Rafael
-

From: Rafael J. Wysocki
Date: Tuesday, March 20, 2007 - 1:12 pm

Can I see .config please?

Rafael
-

From: J.A.
Date: Tuesday, March 20, 2007 - 9:36 am

(oops, I forgot LKML)

I have no udev events for my dvd-rw...
When I insert a disc in the dvd reader:

werewolf:~# udevmonitor
udevmonitor prints the received event from the kernel [UEVENT]
and the event which udev sends out after rule processing [UDEV]

UEVENT[1174385162.607021] mount    /block/sr1 (block)
UDEV  [1174385162.610056] mount    /block/sr1 (block)

If I insert it in the dvd-rw drive, nothing happens.

extracts from dmesg:
(I have just noticed the message for the 40 wire cable, I will check)
(btw, why the h**l ata busses start nubering in 1 and scsi ones in 0 :((((,
it ata also begun in 0 life will be much easier...)

ata_piix 0000:00:1f.1: version 2.10ac1
ata1: PATA max UDMA/133 cmd 0x000101f0 ctl 0x000103f6 bmdma 0x0001f000 irq 14
ata2: PATA max UDMA/133 cmd 0x00010170 ctl 0x00010376 bmdma 0x0001f008 irq 15
scsi0 : ata_piix
ata1.00: ATAPI, max UDMA/33
ata1.01: ATAPI, max MWDMA0, CDB intr
ata1.00: configured for UDMA/33
ata1.01: configured for PIO3
scsi1 : ata_piix
ata2.00: ATA-6: ST3120022A, 3.06, max UDMA/100
ata2.00: 234441648 sectors, multi 16: LBA48 
ata2.01: ATAPI, max UDMA/33
ata2.00: limited to UDMA/33 due to 40-wire cable
ata2.00: configured for UDMA/33
ata2.01: configured for UDMA/33
scsi 0:0:0:0: CD-ROM            HL-DT-ST DVDRAM GSA-H10N  JL10 PQ: 0 ANSI: 5
sr0: scsi3-mmc drive: 48x/48x writer dvd-ram cd/rw xa/form2 cdda tray
Uniform CD-ROM driver Revision: 3.20
sr 0:0:0:0: Attached scsi CD-ROM sr0
scsi 0:0:1:0: Direct-Access     IOMEGA   ZIP 250          51.G PQ: 0 ANSI: 5
sd 0:0:1:0: [sda] Attached SCSI removable disk
scsi 1:0:0:0: Direct-Access     ATA      ST3120022A       3.06 PQ: 0 ANSI: 5
sd 1:0:0:0: [sdb] 234441648 512-byte hardware sectors (120034 MB)
sd 1:0:0:0: [sdb] Attached SCSI disk
scsi 1:0:1:0: CD-ROM            TOSHIBA  DVD-ROM SD-M1712 1004 PQ: 0 ANSI: 5
sr1: scsi3-mmc drive: 48x/48x cd/rw xa/form2 cdda tray
sr 1:0:1:0: Attached scsi CD-ROM sr1
ata_piix 0000:00:1f.2: MAP [ P0 -- P1 -- ]
ata3: SATA max UDMA/133 cmd ...
From: J.A.
Date: Tuesday, March 20, 2007 - 5:14 pm

I realized that my scsi devices were like this:

werewolf:~# lsscsi
[0:0:0:0]    cd/dvd  HL-DT-ST DVDRAM GSA-H10N  JL10  /dev/.tmp-11-0
[0:0:1:0]    disk    IOMEGA   ZIP 250          51.G  /dev/sda
[1:0:0:0]    disk    ATA      ST3120022A       3.06  /dev/sdb
[1:0:1:0]    cd/dvd  TOSHIBA  DVD-ROM SD-M1712 1004  /dev/.tmp-11-1
[2:0:0:0]    disk    ATA      ST3200822AS      3.01  /dev/sdc
[7:0:0:0]    disk    LG       USBDrive         1100  /dev/sdd

After a service udev force-reload:

werewolf:~# lsscsi
[0:0:0:0]    cd/dvd  HL-DT-ST DVDRAM GSA-H10N  JL10  /dev/sr0
[0:0:1:0]    disk    IOMEGA   ZIP 250          51.G  /dev/sda
[1:0:0:0]    disk    ATA      ST3120022A       3.06  /dev/sdb
[1:0:1:0]    cd/dvd  TOSHIBA  DVD-ROM SD-M1712 1004  /dev/sr1
[2:0:0:0]    disk    ATA      ST3200822AS      3.01  /dev/sdc
[7:0:0:0]    disk    LG       USBDrive         1100  /dev/sdd

If I insert a disc in /dev/sr1 and eject it:

werewolf:~# lsscsi
[0:0:0:0]    cd/dvd  HL-DT-ST DVDRAM GSA-H10N  JL10  /dev/sr0
[0:0:1:0]    disk    IOMEGA   ZIP 250          51.G  /dev/sda
[1:0:0:0]    disk    ATA      ST3120022A       3.06  /dev/sdb
[1:0:1:0]    cd/dvd  TOSHIBA  DVD-ROM SD-M1712 1004  /dev/.tmp-11-1
[2:0:0:0]    disk    ATA      ST3200822AS      3.01  /dev/sdc
[7:0:0:0]    disk    LG       USBDrive         1100  /dev/sdd

If I reload the disc in the TOSHIBA, it is automounted but the strange
device is still there.

Trying with /dev/sr0 still gives no events. What is happening here ?
It is the kernel or is udev setup ?

TIA

--
J.A. Magallon <jamagallon()ono!com>     \               Software is like sex:
                                         \         It's better when it's free
Mandriva Linux release 2007.1 (Cooker) for i586
Linux 2.6.20-jam05 (gcc 4.1.2 20070302 (prerelease) (4.1.2-1mdv2007.1)) #2 SMP PREEMPT
-

From: Randy Dunlap
Date: Tuesday, March 20, 2007 - 10:31 am

LD      .tmp_vmlinux1
kernel/built-in.o:(.data+0xfc0): undefined reference to `maps_protect'
make: *** [.tmp_vmlinux1] Error 1

with CONFIG_PROC_FS=n

Kees?


---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-

From: Kees Cook
Date: Tuesday, March 20, 2007 - 12:20 pm

Gah!  Apologies.  This should fix it, but I can't test it since I can't 
get 2.6.21-rc4-mm1 to compile (with or without this fix):

  GEN     .version
init/.missing_syscalls.h.cmd:2: *** missing separator.  Stop.
make: *** [.tmp_vmlinux1] Error 2


Signed-off-by: Kees Cook <kees@outflux.net>
---
diff -uNrp linux-2.6.21-rc4-mm1/kernel/sysctl.c linux-2.6.21-rc4-mm1-kees/kernel/sysctl.c
--- linux-2.6.21-rc4-mm1/kernel/sysctl.c	2007-03-20 10:45:06.000000000 -0700
+++ linux-2.6.21-rc4-mm1-kees/kernel/sysctl.c	2007-03-20 11:36:06.000000000 -0700
@@ -77,9 +77,12 @@ extern int pid_max_min, pid_max_max;
 extern int sysctl_drop_caches;
 extern int percpu_pagelist_fraction;
 extern int compat_log;
-extern int maps_protect;
 extern int print_fatal_signals;
 
+#ifdef CONFIG_PROC_FS
+extern int maps_protect;
+#endif
+
 #if defined(CONFIG_ADAPTIVE_READAHEAD)
 extern int readahead_ratio;
 extern int readahead_hit_rate;
@@ -619,6 +622,7 @@ static ctl_table kern_table[] = {
 		.proc_handler	= &proc_dointvec,
 	},
 #endif
+#ifdef CONFIG_PROC_FS
 	{
 		.ctl_name       = CTL_UNNUMBERED,
 		.procname       = "maps_protect",
@@ -627,6 +631,7 @@ static ctl_table kern_table[] = {
 		.mode           = 0644,
 		.proc_handler   = &proc_dointvec,
 	},
+#endif
 
 	{ .ctl_name = 0 }
 };




-- 
Kees Cook                                            @outflux.net
-

From: Stéphane Jourdois
Date: Tuesday, March 20, 2007 - 1:42 pm

Hi,

Would you please try the following patch, after removing
init/.missing_syscalls.h.cmd ?

http://lkml.org/lkml/2007/3/20/79

thanks.

- Stéphane.

-- 
 ///  Stephane Jourdois     /"\  ASCII RIBBON CAMPAIGN \\\
(((    Consultant securite  \ /    AGAINST HTML MAIL    )))
 \\\   24 rue Cauchy         X                         ///
  \\\  75015  Paris         / \    +33 6 8643 3085    ///
-

From: Randy Dunlap
Date: Tuesday, March 20, 2007 - 1:50 pm

Yes, that works_for_me.



---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-

From: Andrew Morton
Date: Tuesday, March 20, 2007 - 10:47 pm

How'd you manage that?

Sam, I think this is a you-thing rather than a dwmw2-thing?
-

From: David Woodhouse
Date: Wednesday, March 21, 2007 - 4:25 am

What if you remove init/.missing_syscalls.h.cmd _and_ apply the patch,
then try again?

-- 
dwmw2

-

From: Sam Ravnborg
Date: Wednesday, March 21, 2007 - 4:59 am

I will give it a shot tonight.
One issue I have with current approach is that the ARCH specific things are
in a single .h file.

	Sam
-

From: David Woodhouse
Date: Thursday, March 22, 2007 - 2:17 am

Que? There aren't really any arch-specific things, except for a list of
syscalls to be ignored which are i386-specific. That's because we're
pulling in the 'master' system call list from asm-i386/unistd.h, and we
need to exclude some of those which we don't really need on other
architectures.

-- 
dwmw2

-

From: Sam Ravnborg
Date: Thursday, March 22, 2007 - 4:41 am

Yep - realized this when I took a closer look.
One thing striked my mind. It is correct that new things gets added
to i386 first these days?
To me it looks like x86_64 is growing larger than i386 among the
developers these days so using asm-x86_64/unistd.h could be a better choice?

	Sam
-

From: David Woodhouse
Date: Thursday, March 22, 2007 - 9:25 am

Personally I tend to do PowerPC first, but most others seem to do i386,
yes. There are still system calls being added to i386 and not x86_64...

init/missing_syscalls.h:947:3: warning: #warning syscall getcpu not implemented
init/missing_syscalls.h:950:3: warning: #warning syscall epoll_pwait not implemented
init/missing_syscalls.h:953:3: warning: #warning syscall lutimesat not implemented
init/missing_syscalls.h:956:3: warning: #warning syscall revokeat not implemented

Or perhaps the union of i386, x86_64 and powerpc. But I think i386 is
good enough for now.

-- 
dwmw2

-

From: Sam Ravnborg
Date: Thursday, March 22, 2007 - 9:28 am

I kept i386 as default so all is good.

	Sam
-

From: Sam Ravnborg
Date: Wednesday, March 21, 2007 - 3:19 pm

Took a look. Things looked pretty OK but an updated patch
applied to kbuild.git.
Corrected a few things in the Makefile and combined the
patch from dwmw2 and Stephane.

kbuild.git pused out and patches follows.

	Sam
-

From: Andrew Morton
Date: Wednesday, March 21, 2007 - 4:01 pm

On Wed, 21 Mar 2007 23:19:05 +0100

David has set up a git tree with this stuff, so you presumably
have an out-of-date copy.

git://git.infradead.org/~dwmw2/syscalls-2.6.git

I don't know what's changed in there.  One never does, with git
trees :(
-

From: Sam Ravnborg
Date: Thursday, March 22, 2007 - 1:54 am

I pulled that one and last patch in my serie was from that tree.

	Sam
-

From: Randy Dunlap
Date: Tuesday, March 20, 2007 - 11:09 am

From: Randy Dunlap <randy.dunlap@oracle.com>

Avoid multiple/repeated warnings:
include/linux/utrace.h:594: warning: return type defaults to 'int'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
---
 include/linux/utrace.h |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- linux-2.6.21-rc4-mm1.orig/include/linux/utrace.h
+++ linux-2.6.21-rc4-mm1/include/linux/utrace.h
@@ -590,7 +590,7 @@ static inline void utrace_report_death(s
 {
 	BUG();
 }
-static inline utrace_report_delayed_group_leader(struct task_struct *tsk)
+static inline void utrace_report_delayed_group_leader(struct task_struct *tsk)
 {
 	BUG();
 }
-

From: Roland McGrath
Date: Tuesday, March 20, 2007 - 6:48 pm

Oops!  Thanks for catching this.


Thanks,
Roland
-

From: Adrian Bunk
Date: Tuesday, March 20, 2007 - 1:49 pm

<--  snip  -->

...
  LD      drivers/pci/hotplug/built-in.o
drivers/pci/hotplug/shpchp.o: In function `queue_pushbutton_work':(.text+0x112f): multiple definition of `queue_pushbutton_work'
drivers/pci/hotplug/pciehp.o:(.text+0x1004): first defined here
make[4]: *** [drivers/pci/hotplug/built-in.o] Error 1

<--  snip  -->

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

-

From: Kristen Carlson Accardi
Date: Wednesday, March 21, 2007 - 11:45 am

Fix duplicate names in shpchp and pciehp.

Signed-off-by: Kristen Carlson Accardi <kristen.c.accardi@intel.com>
Index: 2.6-mm/drivers/pci/hotplug/pciehp.h
===================================================================
--- 2.6-mm.orig/drivers/pci/hotplug/pciehp.h
+++ 2.6-mm/drivers/pci/hotplug/pciehp.h
@@ -158,7 +158,7 @@ extern u8 pciehp_handle_presence_change(
 extern u8 pciehp_handle_power_fault(u8 hp_slot, struct controller *ctrl);
 extern int pciehp_configure_device(struct slot *p_slot);
 extern int pciehp_unconfigure_device(struct slot *p_slot);
-extern void queue_pushbutton_work(struct work_struct *work);
+extern void pciehp_queue_pushbutton_work(struct work_struct *work);
 int pcie_init(struct controller *ctrl, struct pcie_device *dev);
 
 static inline struct slot *pciehp_find_slot(struct controller *ctrl, u8 device)
Index: 2.6-mm/drivers/pci/hotplug/pciehp_core.c
===================================================================
--- 2.6-mm.orig/drivers/pci/hotplug/pciehp_core.c
+++ 2.6-mm/drivers/pci/hotplug/pciehp_core.c
@@ -229,7 +229,7 @@ static int init_slots(struct controller 
 		slot->hpc_ops = ctrl->hpc_ops;
 		slot->number = ctrl->first_slot;
 		mutex_init(&slot->lock);
-		INIT_DELAYED_WORK(&slot->work, queue_pushbutton_work);
+		INIT_DELAYED_WORK(&slot->work, pciehp_queue_pushbutton_work);
 
 		/* register this slot with the hotplug pci core */
 		hotplug_slot->private = slot;
Index: 2.6-mm/drivers/pci/hotplug/pciehp_ctrl.c
===================================================================
--- 2.6-mm.orig/drivers/pci/hotplug/pciehp_ctrl.c
+++ 2.6-mm/drivers/pci/hotplug/pciehp_ctrl.c
@@ -351,7 +351,7 @@ static void pciehp_power_thread(struct w
 	kfree(info);
 }
 
-void queue_pushbutton_work(struct work_struct *work)
+void pciehp_queue_pushbutton_work(struct work_struct *work)
 {
 	struct slot *p_slot = container_of(work, struct slot, work.work);
 	struct power_work_info *info;
Index: ...
From: J.A.
Date: Tuesday, March 20, 2007 - 2:04 pm

After applying hot-fixes, I get this:

MODPOST vmlinux
WARNING: init/built-in.o - Section mismatch: reference to .init.text: from .text between 'rest_init' (at offset 0xfa) and 'try_name'
WARNING: arch/i386/kernel/built-in.o - Section mismatch: reference to .init.text:cpu_set_gdt from .text between 'initialize_secondary' (at offset 0xbce3) and 'mp_find_ioapic'
WARNING: mm/built-in.o - Section mismatch: reference to .init.data:initkmem_list3 from .text between 'set_up_list3s' (at offset 0x1b384) and 's_start'

If you need anything, just ask (.config or the like)

--
J.A. Magallon <jamagallon()ono!com>     \               Software is like sex:
                                         \         It's better when it's free
Mandriva Linux release 2007.1 (Cooker) for i586
Linux 2.6.20-jam05 (gcc 4.1.2 20070302 (prerelease) (4.1.2-1mdv2007.1)) #2 SMP PREEMPT
-

From: Stefan Richter
Date: Tuesday, March 20, 2007 - 4:10 pm

...

Just a note for readers of lkml:  git-ieee1394.patch is steadily growing
thanks to Kristian Høgsberg's work on his new alternative FireWire
drivers.  Recently Kristian posted preliminary patches to the popular
low-level FireWire libraries libraw1394 and libdc1394, making them
interoperable with his newly designed kernel--userspace ABI.  (Mainline
Linux' IEEE 1394 subsystem features a slightly unfortunate variety of
userspace ABIs, some of them abstracted by the mentioned libraries, some
directly used.)  I heard Kristian also already worked on integration
with HAL, i.e. there are now more and more pieces of the puzzle coming
together.
-- 
Stefan Richter
-=====-=-=== --== =-=--
http://arcgraph.de/sr/
-

From: Randy Dunlap
Date: Tuesday, March 20, 2007 - 4:49 pm

I think that this:

config EEPROM_93CX6
	tristate "EEPROM 93CX6 support"
	---help---
	This is a driver for the EEPROM chipsets 93c46 and 93c66.
	The driver supports both read as well as write commands.

should not be in lib/Kconfig.  lib/ is not for drivers.
or (simpler) s/driver/library/
but I think I'd rather see it in drivers/misc/.


and the help text needs to be indented 2 more spaces...

---
~Randy
boilerplate:
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-

From: Randy Dunlap
Date: Tuesday, March 20, 2007 - 6:47 pm

UIO_CIF should depend on PCI ??

With CONFIG_PCI=n, I get:

ERROR: "pci_module_init" [drivers/uio/uio_cif.ko] undefined!
ERROR: "pci_release_regions" [drivers/uio/uio_cif.ko] undefined!
make[1]: *** [__modpost] Error 1
make: *** [modules] Error 2

---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-

From: Greg KH
Date: Wednesday, March 21, 2007 - 11:36 am

Thanks, I've made that change now.

greg k-h
-

From: Reuben Farrelly
Date: Wednesday, March 21, 2007 - 3:14 am

Just booted into this kernel, and hit this, which locked up the machine:

This is tornado.reub.net (Linux x86_64 2.6.21-rc4-mm1) 20:16:58

tornado login: ------------[ cut here ]------------
kernel BUG at kernel/sched.c:3505!
invalid opcode: 0000 [1] SMP
last sysfs file: devices/pci0000:00/0000:00:1f.3/i2c-adapter/i2c-0/0-002e/pwm3
CPU 1
Modules linked in: firmware_class eeprom lm85 hwmon_vid i2c_i801 8021q 
iptable_filter iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nfnetlink 
iptable_mangle ip_tables nfs lockd sunrpc ohci1394 ieee1394 usb_storage
Pid: 8250, comm: clamd Not tainted 2.6.21-rc4-mm1 #1
RIP: 0010:[<ffffffff8025d2cb>]  [<ffffffff8025d2cb>] __sched_text_start+0x3cb/0x8b3
RSP: 0000:ffff8100023cfee0  EFLAGS: 00010002
RAX: 000000000000008c RBX: ffff810001e040e8 RCX: 000000000000000c
RDX: 0000000000000000 RSI: 000000000000008c RDI: ffff810001e049b8
RBP: ffff8100023cff70 R08: 000000000000008c R09: ffff810001e049a8
R10: 0000000000000034 R11: 0000000000000000 R12: ffff810001e03f00
R13: 0000000000000002 R14: 00000000ffffffff R15: 000000521b55f827
FS:  00002b1dfda2ec00(0000) GS:ffff81000208ec40(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00002aaaaafcf000 CR3: 0000000004ac3000 CR4: 00000000000006e0
Process clamd (pid: 8250, threadinfo ffff8100023ce000, task ffff810004c090a0)
Stack:  ffff810004c090a0 ffffffff8025fdb7 ffff810004c090a0 00007fffae43e955
  ffff810004c09248 00000001023cff28 ffffffff8029635d 0000000000c5aac0
  0000000000000005 00002b1dfc7d6d5a ffffffff8025fdb7 0000000000000000
Call Trace:
  [<ffffffff8025fdb7>] trace_hardirqs_on_thunk+0x35/0x37
  [<ffffffff8029635d>] trace_hardirqs_on+0x12a/0x15d
  [<ffffffff8025fdb7>] trace_hardirqs_on_thunk+0x35/0x37
  [<ffffffff8025a7e0>] retint_careful+0x12/0x2e


Code: 0f 0b eb fe 49 8b 94 24 e0 01 00 00 49 8b 84 24 d8 01 00 00
RIP  [<ffffffff8025d2cb>] __sched_text_start+0x3cb/0x8b3
  RSP <ffff8100023cfee0>
BUG: spinlock lockup on CPU#0, swapper/0, ffff810001e03f00
BUG: ...
From: J.A.
Date: Thursday, March 22, 2007 - 4:27 pm

Is anybody having problems with optical drives and this kernel ?
I can't get my dvdrw to spit any events to udevmonitor. If I mount it
manually everything works fine.

Perhaps the problem is in hal/g-v-m or anything else, but I suppose that
udevmonitor receives events directly from kernel, isn't it ?

TIA

--
J.A. Magallon <jamagallon()ono!com>     \               Software is like sex:
                                         \         It's better when it's free
Mandriva Linux release 2007.1 (Cooker) for i586
Linux 2.6.20-jam05 (gcc 4.1.2 20070302 (prerelease) (4.1.2-1mdv2007.1)) #1 SMP PREEMPT
-

From: Andrew Morton
Date: Thursday, March 22, 2007 - 6:41 pm

Please always do reply-to-all.



Probably related to the not-yet-completely-solved firmware loader failures.

It would be good if someone could do a bisection search on this.  I face a
fun evening hunting down a horrendous ext3 performance regression which is
now in mainline.

-

From: J.A.
Date: Monday, March 26, 2007 - 1:31 pm

Finally, this was a userspace problem (hal):

http://lists.freedesktop.org/archives/hal/2007-March/007545.html

What I don't understand is this: I supposed that udev (and so udevmonitor)
is independent of hal, more or less hal monitors udev events and does things,
like looking the disc label and so on.

But I do not get any events in udevmonitor if I'm not logged in gnome.
How's this ?

TIA

--
J.A. Magallon <jamagallon()ono!com>     \               Software is like sex:
                                         \         It's better when it's free
Mandriva Linux release 2007.1 (Cooker) for i586
Linux 2.6.20-jam05 (gcc 4.1.2 20070302 (prerelease) (4.1.2-1mdv2007.1)) #2 SMP PREEMPT
-

From: Adrian Bunk
Date: Saturday, March 24, 2007 - 6:06 am

check_bug_kill() is no longer used.

Signed-off-by: Adrian Bunk <bunk@stusta.de>

---

 arch/i386/lguest/interrupts_and_traps.c |    2 ++
 arch/i386/lguest/lg.h                   |    1 -
 2 files changed, 2 insertions(+), 1 deletion(-)

--- linux-2.6.21-rc4-mm1/arch/i386/lguest/lg.h.old	2007-03-23 23:17:05.000000000 +0100
+++ linux-2.6.21-rc4-mm1/arch/i386/lguest/lg.h	2007-03-23 23:17:10.000000000 +0100
@@ -195,7 +195,6 @@
 /* interrupts_and_traps.c: */
 void maybe_do_interrupt(struct lguest *lg);
 int deliver_trap(struct lguest *lg, unsigned int num);
-void check_bug_kill(struct lguest *lg);
 void load_guest_idt_entry(struct lguest *lg, unsigned int i, u32 low, u32 hi);
 void pin_stack_pages(struct lguest *lg);
 void pin_trap_pages(struct lguest *lg);
--- linux-2.6.21-rc4-mm1/arch/i386/lguest/interrupts_and_traps.c.old	2007-03-23 23:17:18.000000000 +0100
+++ linux-2.6.21-rc4-mm1/arch/i386/lguest/interrupts_and_traps.c	2007-03-23 23:17:38.000000000 +0100
@@ -118,6 +118,7 @@
 	return 1;
 }
 
+#if 0
 void check_bug_kill(struct lguest *lg)
 {
 #ifdef CONFIG_BUG
@@ -144,6 +145,7 @@
 	}
 #endif	/* CONFIG_BUG */
 }
+#endif  /*  0  */
 
 static int direct_trap(const struct lguest *lg,
 		       const struct desc_struct *trap,

-

From: Rusty Russell
Date: Sunday, March 25, 2007 - 12:33 am

Thanks Adrian, that was actually an oversight.  However, this function
is most useful in early bringup, so I didn't notice it was gone.

I'd prefer a patch which eliminates it altogether, rather than #if 0 it
out.

Thanks!
Rusty.


-

From: Adrian Bunk
Date: Sunday, March 25, 2007 - 7:57 am

cu
Adrian


<--  snip  -->


check_bug_kill() is no longer used.

Signed-off-by: Adrian Bunk <bunk@stusta.de>

---

 arch/i386/lguest/interrupts_and_traps.c |   27 ------------------------
 arch/i386/lguest/lg.h                   |    1 
 2 files changed, 28 deletions(-)

--- linux-2.6.21-rc4-mm1/arch/i386/lguest/lg.h.old	2007-03-25 14:38:22.000000000 +0200
+++ linux-2.6.21-rc4-mm1/arch/i386/lguest/lg.h	2007-03-25 14:41:13.000000000 +0200
@@ -195,7 +195,6 @@
 /* interrupts_and_traps.c: */
 void maybe_do_interrupt(struct lguest *lg);
 int deliver_trap(struct lguest *lg, unsigned int num);
-void check_bug_kill(struct lguest *lg);
 void load_guest_idt_entry(struct lguest *lg, unsigned int i, u32 low, u32 hi);
 void pin_stack_pages(struct lguest *lg);
 void pin_trap_pages(struct lguest *lg);
--- linux-2.6.21-rc4-mm1/arch/i386/lguest/interrupts_and_traps.c.old	2007-03-25 14:38:46.000000000 +0200
+++ linux-2.6.21-rc4-mm1/arch/i386/lguest/interrupts_and_traps.c	2007-03-25 14:41:25.000000000 +0200
@@ -118,33 +118,6 @@
 	return 1;
 }
 
-void check_bug_kill(struct lguest *lg)
-{
-#ifdef CONFIG_BUG
-	u32 eip = lg->regs->eip - PAGE_OFFSET;
-	u16 insn;
-
-	/* This only works for addresses in linear mapping... */
-	if (lg->regs->eip < PAGE_OFFSET)
-		return;
-	lhread(lg, &insn, eip, sizeof(insn));
-	if (insn == 0x0b0f) {
-#ifdef CONFIG_DEBUG_BUGVERBOSE
-		u16 l;
-		u32 f;
-		char file[128];
-		lhread(lg, &l, eip+sizeof(insn), sizeof(l));
-		lhread(lg, &f, eip+sizeof(insn)+sizeof(l), sizeof(f));
-		lhread(lg, file, f - PAGE_OFFSET, sizeof(file));
-		file[sizeof(file)-1] = 0;
-		kill_guest(lg, "BUG() at %#x %s:%u", eip, file, l);
-#else
-		kill_guest(lg, "BUG() at %#x", eip);
-#endif	/* CONFIG_DEBUG_BUGVERBOSE */
-	}
-#endif	/* CONFIG_BUG */
-}
-
 static int direct_trap(const struct lguest *lg,
 		       const struct desc_struct *trap,
 		       unsigned int num)

-

From: Adrian Bunk
Date: Saturday, March 24, 2007 - 6:06 am

This patch makes two needlessly global functions static.

Signed-off-by: Adrian Bunk <bunk@stusta.de>

---

 drivers/scsi/constants.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- linux-2.6.21-rc4-mm1/drivers/scsi/constants.c.old	2007-03-23 23:26:39.000000000 +0100
+++ linux-2.6.21-rc4-mm1/drivers/scsi/constants.c	2007-03-23 23:26:55.000000000 +0100
@@ -1235,7 +1235,7 @@
 }
 EXPORT_SYMBOL(scsi_print_sense_hdr);
 
-void
+static void
 scsi_decode_sense_buffer(const unsigned char *sense_buffer, int sense_len,
 		       struct scsi_sense_hdr *sshdr)
 {
@@ -1258,7 +1258,7 @@
 	}
 }
 
-void
+static void
 scsi_decode_sense_extras(const unsigned char *sense_buffer, int sense_len,
 			 struct scsi_sense_hdr *sshdr)
 {

-

From: Douglas Gilbert
Date: Saturday, March 24, 2007 - 9:11 am

Adrian,
Who put those functions in?

The names and arguments look very similar to these
exported functions in scsi_error.c *** :
  scsi_normalize_sense
  scsi_sense_desc_find
  scsi_get_sense_info_fld

that I can see in 2.6.21-rc4

The proposed scsi_decode_sense_buffer() looks broken because
it can fail and should return an int reflecting that.
How scsi_decode_sense_extras() works is intriguing, unless
struct scsi_sense_hdr has been changed as well.


*** Putting sense decode logic in scsi_error.c is wrong
because:
  - the ATA command set is proposing an ATA REQUEST SENSE
    command to yield a sense buffer
  - sense buffers don't necessarily indicate errors.

So moving those functions out of scsi_error.c IMO is a
good idea. Breaking them in the move isn't.

Doug Gilbert


-

From: Adrian Bunk
Date: Saturday, March 24, 2007 - 10:02 am

[SCSI] constants.c: cleanup, verbose result printing

From: Martin K. Petersen

Clean up constants.c and make result printing more user friendly:

 - Refactor the command and sense functions so that the actual
   formatting can be called from the various helper functions with the
   correct prefix.

 - Replace scsi_print_hostbyte() and scsi_print_driverbyte() with
   scsi_print_result() which is verbose when CONFIG_SCSI_CONSTANTS is
   on.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

-

From: Adrian Bunk
Date: Saturday, March 24, 2007 - 6:07 am

bio_{,un}map_user do no longer have any modular users.

Signed-off-by: Adrian Bunk <bunk@stusta.de>

---
--- linux-2.6.21-rc4-mm1/fs/bio.c.old	2007-03-24 11:42:28.000000000 +0100
+++ linux-2.6.21-rc4-mm1/fs/bio.c	2007-03-24 11:42:39.000000000 +0100
@@ -1253,8 +1253,6 @@
 EXPORT_SYMBOL(bio_add_page);
 EXPORT_SYMBOL(bio_add_pc_page);
 EXPORT_SYMBOL(bio_get_nr_vecs);
-EXPORT_SYMBOL(bio_map_user);
-EXPORT_SYMBOL(bio_unmap_user);
 EXPORT_SYMBOL(bio_map_kern);
 EXPORT_SYMBOL(bio_pair_release);
 EXPORT_SYMBOL(bio_split);

-

From: Adrian Bunk
Date: Saturday, March 24, 2007 - 6:07 am

This patch contains the following:
- every file should #include the headers containing the prototypes for
  it's global functions
- fix the wrong return value of sys_frevoke() gcc was now able to detect
- make 2 needlessly global structs static

Signed-off-by: Adrian Bunk <bunk@stusta.de>

---

 fs/revoke.c |    7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

--- linux-2.6.21-rc4-mm1/fs/revoke.c.old	2007-03-23 23:31:46.000000000 +0100
+++ linux-2.6.21-rc4-mm1/fs/revoke.c	2007-03-23 23:50:39.000000000 +0100
@@ -16,6 +16,7 @@
 #include <linux/mount.h>
 #include <linux/sched.h>
 #include <linux/revoked_fs_i.h>
+#include <linux/syscalls.h>
 
 /*
  * This is used for pre-allocating an array of file pointers so that we don't
@@ -28,7 +29,7 @@
 	unsigned long restore_start;
 };
 
-struct kmem_cache *revokefs_inode_cache;
+static struct kmem_cache *revokefs_inode_cache;
 
 /*
  * Revoked file descriptors point to inodes in the revokefs filesystem.
@@ -551,7 +552,7 @@
 	return err;
 }
 
-asmlinkage int sys_frevoke(unsigned int fd)
+asmlinkage long sys_frevoke(unsigned int fd)
 {
 	struct file *file = fget(fd);
 	int err = -EBADF;
@@ -618,7 +619,7 @@
 			     REVOKEFS_MAGIC, mnt);
 }
 
-struct file_system_type revokefs_fs_type = {
+static struct file_system_type revokefs_fs_type = {
 	.name = "revokefs",
 	.get_sb = revokefs_get_sb,
 	.kill_sb = kill_anon_super

-

From: Pekka Enberg
Date: Saturday, March 24, 2007 - 6:15 am

Looks good. Thanks!

Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
-

From: Adrian Bunk
Date: Sunday, March 25, 2007 - 7:58 am

I was looking at the following section error:

<--  snip  -->

WARNING: drivers/built-in.o - Section mismatch: reference to .init.text:eisa_root_register from .text between 'pci_eisa_init' (at offset 0xabf670) and 'virtual_eisa_release'

<--  snip  -->

AFAIK a PCI to EISA bridge isn't anything hotpluggable, so 
pci_eisa_init() can become __init.

Signed-off-by: Adrian Bunk <bunk@stusta.de>

---
--- linux-2.6.21-rc4-mm1/drivers/eisa/pci_eisa.c.old	2007-03-25 15:51:01.000000000 +0200
+++ linux-2.6.21-rc4-mm1/drivers/eisa/pci_eisa.c	2007-03-25 15:51:17.000000000 +0200
@@ -19,8 +19,8 @@
 /* There is only *one* pci_eisa device per machine, right ? */
 static struct eisa_root_device pci_eisa_root;
 
-static int __devinit pci_eisa_init (struct pci_dev *pdev,
-				    const struct pci_device_id *ent)
+static int __init pci_eisa_init(struct pci_dev *pdev,
+				const struct pci_device_id *ent)
 {
 	int rc;
 

-

From: J.A.
Date: Sunday, March 25, 2007 - 5:24 pm

Libata seems to misdetect my cable.
I have double-checked and the cable is 80 pin...

ata1 is PATA ICH5 bus 1 with DVD-RW + ZIP and 40 pin cable
ata2 is PATA ICH5 bus 2 with extra HD + DVD and 80 pin cable
ata3 is real SATA ICH5 with boot HD

(mm, I chaged bios settings to get the box booting from the SATA disk)

werewolf:~# lsscsi
[0:0:0:0]    cd/dvd  HL-DT-ST DVDRAM GSA-H10N  JL12  /dev/.tmp-11-0
[0:0:1:0]    disk    IOMEGA   ZIP 250          51.G  /dev/sda
[1:0:0:0]    disk    ATA      ST3120022A       3.06  /dev/sdb
[1:0:1:0]    cd/dvd  TOSHIBA  DVD-ROM SD-M1712 1004  /dev/sr1
[2:0:0:0]    disk    ATA      ST3200822AS      3.01  /dev/sdc

ata_piix 0000:00:1f.1: version 2.10ac1
ACPI: PCI Interrupt 0000:00:1f.1[A] -> GSI 18 (level, low) -> IRQ 18
PCI: Setting latency timer of device 0000:00:1f.1 to 64
ata1: PATA max UDMA/133 cmd 0x000101f0 ctl 0x000103f6 bmdma 0x0001f000 irq 14
ata2: PATA max UDMA/133 cmd 0x00010170 ctl 0x00010376 bmdma 0x0001f008 irq 15
scsi0 : ata_piix
ata1.00: ATAPI, max UDMA/33
ata1.01: ATAPI, max MWDMA0, CDB intr
ata1.00: configured for UDMA/33
ata1.01: configured for PIO3
scsi1 : ata_piix
ata2.00: ATA-6: ST3120022A, 3.06, max UDMA/100
ata2.00: 234441648 sectors, multi 16: LBA48
ata2.01: ATAPI, max UDMA/33
ata2.00: limited to UDMA/33 due to 40-wire cable    <=======================
ata2.00: configured for UDMA/33
ata2.01: configured for UDMA/33
scsi 0:0:0:0: CD-ROM            HL-DT-ST DVDRAM GSA-H10N  JL12 PQ: 0 ANSI: 5
sr0: scsi3-mmc drive: 48x/48x writer dvd-ram cd/rw xa/form2 cdda tray
Uniform CD-ROM driver Revision: 3.20
sr 0:0:0:0: Attached scsi CD-ROM sr0
scsi 0:0:1:0: Direct-Access     IOMEGA   ZIP 250          51.G PQ: 0 ANSI: 5
sd 0:0:1:0: [sda] Attached SCSI removable disk
scsi 1:0:0:0: Direct-Access     ATA      ST3120022A       3.06 PQ: 0 ANSI: 5
sd 1:0:0:0: [sdb] 234441648 512-byte hardware sectors (120034 MB)
sd 1:0:0:0: [sdb] Write Protect is off
sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
sd 1:0:0:0: [sdb] Write cache: enabled, ...
From: Tejun Heo
Date: Monday, March 26, 2007 - 4:01 am

Does the following patch fix your problem?

  http://article.gmane.org/gmane.linux.ide/17444

(You can get the raw message by appending /raw to the URL).

-- 
tejun
-

From: J.A.
Date: Monday, March 26, 2007 - 1:18 pm

Yes it works !!

Disk is back at nice speed of 50 Mb/s.

dmesg:

ata_piix 0000:00:1f.1: version 2.10ac1
ACPI: PCI Interrupt 0000:00:1f.1[A] -> GSI 18 (level, low) -> IRQ 18
PCI: Setting latency timer of device 0000:00:1f.1 to 64
ata1: PATA max UDMA/133 cmd 0x000101f0 ctl 0x000103f6 bmdma 0x0001f000 irq 14
ata2: PATA max UDMA/133 cmd 0x00010170 ctl 0x00010376 bmdma 0x0001f008 irq 15
scsi0 : ata_piix
ata1.00: ATAPI, max UDMA/33
ata1.01: ATAPI, max MWDMA0, CDB intr
ata1.00: configured for UDMA/33
ata1.01: configured for PIO3
scsi1 : ata_piix
ata2.00: ATA-6: ST3120022A, 3.06, max UDMA/100
ata2.00: 234441648 sectors, multi 16: LBA48 
ata2.01: ATAPI, max UDMA/33
ata2.00: configured for UDMA/100    <=============
ata2.01: configured for UDMA/33     <=============

Thanks !!

--
J.A. Magallon <jamagallon()ono!com>     \               Software is like sex:
                                         \         It's better when it's free
Mandriva Linux release 2007.1 (Cooker) for i586
Linux 2.6.20-jam05 (gcc 4.1.2 20070302 (prerelease) (4.1.2-1mdv2007.1)) #2 SMP PREEMPT
-

From: Badari Pulavarty
Date: Monday, March 26, 2007 - 12:47 pm

CC      arch/powerpc/kernel/ibmebus.o
arch/powerpc/kernel/ibmebus.c:463: error: ‘of_device_uevent’ undeclared
here (not in a function)
make[1]: *** [arch/powerpc/kernel/ibmebus.o] Error 1
make: *** [arch/powerpc/kernel] Error 2

Patch causing the problem in -mm:
	ibmebus-uevent-support.patch

I don't see where ‘of_device_uevent’ is defined :(

Thanks,
Badari

-

From: Paul Mackerras
Date: Monday, March 26, 2007 - 4:29 pm

[Empty message]
From: Badari Pulavarty
Date: Monday, March 26, 2007 - 1:05 pm

# make -j8 modules
  CHK     include/linux/version.h
  CHK     include/linux/utsrelease.h
  LD [M]  sound/soundcore.o
  CC [M]  sound/ppc/beep.o
sound/ppc/beep.c: In function ‘snd_pmac_attach_beep’:
sound/ppc/beep.c:224: error: dereferencing pointer to incomplete type
sound/ppc/beep.c:242: error: dereferencing pointer to incomplete type
sound/ppc/beep.c:265: error: dereferencing pointer to incomplete type
sound/ppc/beep.c: In function ‘snd_pmac_detach_beep’:
sound/ppc/beep.c:275: error: dereferencing pointer to incomplete type
make[2]: *** [sound/ppc/beep.o] Error 1
make[1]: *** [sound/ppc] Error 2
make: *** [sound] Error 2
make: *** Waiting for unfinished jobs....


Patch that is causing the problem in -mm:
gregkh-pci-pci-cleanup-the-includes-of-linux-pcih.patch

sound/ppc/beep.c needs to include <linux/pci.h>

Thanks,
Badari
	

-

From: Jean Delvare
Date: Monday, March 26, 2007 - 12:35 pm

Hi Badari,


Good catch, thanks for reporting. I expected a few false positives to
be left in my patch, but couldn't test everything.

Greg, please update your copy with this version of the patch. The only
change is that sound/ppc/beep.c is removed from the patch.

* * * * *

I noticed that many source files include <linux/pci.h> while they do
not appear to need it. Here is an attempt to clean it all up.

In order to find all possibly affected files, I searched for all
files including <linux/pci.h> but without any other occurence of "pci"
or "PCI". I removed the include statement from all of these, then I
compiled an allmodconfig kernel on both i386 and x86_64 and fixed the
false positives manually.

My tests covered 66% of the affected files, so there could be false
positives remaining. Untested files are:

arch/alpha/kernel/err_common.c
arch/alpha/kernel/err_ev6.c
arch/alpha/kernel/err_ev7.c
arch/ia64/sn/kernel/huberror.c
arch/ia64/sn/kernel/xpnet.c
arch/m68knommu/kernel/dma.c
arch/mips/lib/iomap.c
arch/powerpc/platforms/pseries/ras.c
arch/ppc/8260_io/enet.c
arch/ppc/8260_io/fcc_enet.c
arch/ppc/8xx_io/enet.c
arch/ppc/syslib/ppc4xx_sgdma.c
arch/sh64/mach-cayman/iomap.c
arch/xtensa/kernel/xtensa_ksyms.c
arch/xtensa/platform-iss/setup.c
drivers/i2c/busses/i2c-at91.c
drivers/i2c/busses/i2c-mpc.c
drivers/media/video/saa711x.c
drivers/misc/hdpuftrs/hdpu_cpustate.c
drivers/misc/hdpuftrs/hdpu_nexus.c
drivers/net/au1000_eth.c
drivers/net/fec_8xx/fec_main.c
drivers/net/fec_8xx/fec_mii.c
drivers/net/fs_enet/fs_enet-main.c
drivers/net/fs_enet/mac-fcc.c
drivers/net/fs_enet/mac-fec.c
drivers/net/fs_enet/mac-scc.c
drivers/net/fs_enet/mii-bitbang.c
drivers/net/fs_enet/mii-fec.c
drivers/net/ibm_emac/ibm_emac_core.c
drivers/net/lasi_82596.c
drivers/parisc/hppb.c
drivers/sbus/sbus.c
drivers/video/g364fb.c
drivers/video/platinumfb.c
drivers/video/stifb.c
drivers/video/valkyriefb.c
include/asm-arm/arch-ixp4xx/dma.h
sound/oss/au1550_ac97.c

I would welcome test ...
From: Greg KH
Date: Monday, March 26, 2007 - 4:26 pm

Done.

thanks,

greg k-h
-

From: Badari Pulavarty
Date: Monday, March 26, 2007 - 2:57 pm

Panics my x86-64 box. 2.6.21-rc4 works fine.
Ideas on where to start ? Bisect ?

Thanks,
Badari

..
ReiserFS: hda2: found reiserfs format "3.6" with standard journal
ReiserFS: hda2: using ordered data mode
ReiserFS: hda2: journal params: device hda2, size 8192, journal first
block 18, max trans len 1024, max batch 900, max commit age 30, max
trans age 30
ReiserFS: hda2: checking transaction log (hda2)
ReiserFS: hda2: Using r5 hash to sort names
VFS: Mounted root (reiserfs filesystem) readonly.
Freeing unused kernel memory: 376k freed
INIT: version 2.86 booting
Unable to handle kernel NULL pointer dereference at 0000000000000020
RIP:
 [<ffffffff804ec090>] __sched_text_start+0x460/0x889
PGD 1c1898067 PUD 1c1897067 PMD 0
Oops: 0000 [1] SMP
last sysfs file: block/hda/range
CPU 3
Modules linked in:
Pid: 900, comm: boot Not tainted 2.6.21-rc4-mm1 #1
RIP: 0010:[<ffffffff804ec090>]  [<ffffffff804ec090>] __sched_text_start
+0x460/0x889
RSP: 0018:ffff8101014dfee0  EFLAGS: 00010086
RAX: 0000000000000001 RBX: ffff8101c0010218 RCX: 0000000000000000
RDX: ffff8101c0010ae8 RSI: 0000000000000000 RDI: ffffffffffffffd0
RBP: ffff8101014dff70 R08: 000000000000008c R09: ffff8101c0010ad8
R10: 000000000000001c R11: ffffffff802099be R12: ffff8101c000f780
R13: 0000000000000001 R14: 0000000a7bcffd6e R15: 0000000000000003
FS:  00002b9ef1d40ae0(0000) GS:ffff8101c07b6e40(0000)
knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000020 CR3: 0000000100f8d000 CR4: 00000000000006e0
Process boot (pid: 900, threadinfo ffff8101014de000, task
ffff8101014dd490)
Stack:  0000038c802848cf ffff8101014dfef8 ffffffff80238d82
ffff8101014dd490
 ffffffffffffffd0 ffff8101014dd630 0000000000000000 00007fffb955ff80
 00007fffb9560090 0000000000000000 ffffffff802099be ffff8101014dff48
Call Trace:
 [<ffffffff80238d82>] recalc_sigpending+0x12/0x20
 [<ffffffff802099be>] system_call+0x7e/0x83
 [<ffffffff80207ef3>] sys_clone+0x23/0x30
 [<ffffffff80209a28>] ...
From: Andrew Morton
Date: Monday, March 26, 2007 - 3:22 pm

On Mon, 26 Mar 2007 13:57:57 -0800

This is a very popular oops, caused by the rsdl scheduler.  I don't _think_
we yet know exactly why it is happening.  Con, did you get to the bottom
of this?

We don't know why it confused kallsyms either.

I'll try to shove rc5-mm1 out the door this evening, minus rsdl.  And
-mm2, with rsdl.
-

From: Badari Pulavarty
Date: Monday, March 26, 2007 - 4:43 pm

On Mon, 2007-03-26 at 15:22 -0700, Andrew Morton wrote:

Okay, my ppc64 box hangs on boot. It could be different. I will wait
till rc5-mm1 for debugging that one.

Thanks,
Badari

...
md: Autodetecting RAID arrays.
md: autorun ...
md: ... autorun DONE.
ReiserFS: sda2: found reiserfs format "3.6" with standard journal
ReiserFS: sda2: using ordered data mode
ReiserFS: sda2: journal params: device sda2, size 8192, journal first
block 18, max trans len 1024, max batch 900, max commit age 30, max
trans age 30
ReiserFS: sda2: checking transaction log (sda2)
ReiserFS: sda2: Using r5 hash to sort names
VFS: Mounted root (reiserfs filesystem) readonly.
Freeing unused kernel memory: 272k freed
Warning: unable to open an initial console.
ioctl32(showconsole:1020): Unknown cmd fd(0) cmd(40045432){00} arg
(ffecdb48) on /dev/tty0
ioctl32(showconsole:1048): Unknown cmd fd(0) cmd(40045432){00} arg
(ffb4aad8) on /dev/tty0
PDC20275: IDE controller at PCI slot 0002:d0:01.0
PDC20275: chipset revision 1
PDC20275: PLL input clock is 32814 kHz
PDC20275: 100% native mode on irq 119
    ide2: BM-DMA at 0x2eec00-0x2eec07, BIOS settings: hde:pio, hdf:pio
    ide3: BM-DMA at 0x2eec08-0x2eec0f, BIOS settings: hdg:pio, hdh:pio
scsi 0:0:15:0: Attached scsi generic sg0 type 13
scsi 0:255:255:255: Attached scsi generic sg1 type 31
sd 1:0:4:0: Attached scsi generic sg2 type 0
sd 1:0:5:0: Attached scsi generic sg3 type 0
scsi 1:0:15:0: Attached scsi generic sg4 type 13
scsi 1:255:255:255: Attached scsi generic sg5 type 31
hde: IBM DROM00205, ATAPI CD/DVD-ROM drive
ide2 at 0x2ee400-0x2ee407,0x2edc02 on irq 119
hde: ATAPI 24X DVD-ROM drive, 256kB Cache
Uniform CD-ROM driver Revision: 3.20
device-mapper: ioctl: 4.11.0-ioctl (2006-10-12) initialised: dm-
devel@redhat.com
ioctl32(showconsole:1918): Unknown cmd fd(0) cmd(40045432){00} arg
(fff00ad8) on /dev/tty0
loop: loaded (max 8 devices)
ioctl32(showconsole:2091): Unknown cmd fd(0) cmd(40045432){00} arg
(fff71ae8) on /dev/tty0
Adding 1050616k swap ...
Previous thread: ignore this posting by David Miller on Monday, March 19, 2007 - 7:39 pm. (1 message)

Next thread: Re: XFS internal error xfs_da_do_buf(2) at line 2087 of file fs/xfs/xfs_da_btree.c. Caller 0xc01b00bd by