Re: [regression]2.6.25-rc4: boot panic on alpha

Previous thread: [PATCH 3/5] net: ehea: locking order correction by Daniel Walker on Thursday, March 6, 2008 - 1:00 am. (1 message)

Next thread: Re: [PATCH 2.6.24] chroot= as a new kernel parameter by Greg Schafer on Thursday, March 6, 2008 - 9:53 pm. (1 message)
From: Bob Tracy
Date: Thursday, March 6, 2008 - 8:41 pm

The regression is relative to 2.6.25-rc1 + timer fixes (added when
I reported the problems with 2.6.25-rc1 relative to 2.6.24).  I haven't
tried -rc2 or -rc3.

Might be related to similar reports I've seen, but wanted to add the
Alpha platform to the list.  The -rc4 kernel boots, but at some point
following initialization of the SCSI layer, there's a panic with enough
diagnostic output that a 50-line screen can't hold it all.  Unfortunately,
the system logging isn't active, so nothing gets captured.

The panic is completely reproducible.  The "Code" line at the bottom
of the panic output is

44220001 4821f621 4821b681 4821f62b f420004f 00000081 <0000009b> 0066c012

The first line of the "Trace" section is

 [<fffffc000031a41f>] iommu_arena_alloc+0x64/0xe0

Platform is an Alpha PWS 433au.  gcc is version 4.1.2 (Debian 4.1.1-21).

-- 
------------------------------------------------------------------------
Bob Tracy          |  "I was a beta tester for dirt.  They never did
rct@frus.com       |   get all the bugs out." - Steve McGrew on /.
------------------------------------------------------------------------
--

From: Ingo Molnar
Date: Friday, March 7, 2008 - 12:23 am

[Cc:-ed FUJITA Tomonori who did the Alpha IOMMU changes - full report 
from Bob quoted below.]

Bob, does latest -git boot if you revert these 4 commits:

commit d5a4630a0daad241c761064295958554472ed491
Author: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Date:   Tue Mar 4 14:28:58 2008 -0800

    alpha: remove unused DEBUG_FORCEDAC define in IOMMU

commit cf5401454863df8e6dc3ebe8faad09141cbec187
Author: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Date:   Tue Mar 4 14:28:57 2008 -0800

    alpha: make IOMMU respect the segment boundary limits

commit 23d7e0390ab57cf15a5cfe8d6806192f0997e5a8
Author: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Date:   Tue Mar 4 14:28:57 2008 -0800

    alpha: IOMMU had better access to the free space bitmap at only one place

commit 3c5f1def7dd50b792f56dcf7378c2684c06947f3
Author: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Date:   Tue Mar 4 14:28:54 2008 -0800

    alpha: convert IOMMU to use ALIGN()

	Ingo

--

From: Bob Tracy
Date: Friday, March 7, 2008 - 5:36 am

(FUJITA Tomonori's Alpha IOMMU changes)

I'll get things set up to try that later today...  In the meantime, it
appears the regression is relative to -rc3: I'm up and running on that

-- 
------------------------------------------------------------------------
Bob Tracy          |  "I was a beta tester for dirt.  They never did
rct@frus.com       |   get all the bugs out." - Steve McGrew on /.
------------------------------------------------------------------------
--

From: Ingo Molnar
Date: Friday, March 7, 2008 - 7:14 am

yes, these iommu changes were added post-rc3, they came from -mm and 
were i think declared to be untested on Alpha. So i'd strongly suspect 
them.

	Ingo
--

From: FUJITA Tomonori
Date: Friday, March 7, 2008 - 7:56 am

On Fri, 7 Mar 2008 15:14:40 +0100

Very sorry, I probably broke the IOMMU. I don't have the hardware so
I've not tested the patches.

Bob, if reverting the patches works, can you please try the following
patch? If it works, please let me know about the kernel message.

Thanks,

diff --git a/arch/alpha/kernel/pci_iommu.c b/arch/alpha/kernel/pci_iommu.c
index be6fa10..a6bef1d 100644
--- a/arch/alpha/kernel/pci_iommu.c
+++ b/arch/alpha/kernel/pci_iommu.c
@@ -129,8 +129,9 @@ static inline int is_span_boundary(unsigned int index, unsigned int nr,
 				   unsigned long shift,
 				   unsigned long boundary_size)
 {
-	shift = (shift + index) & (boundary_size - 1);
-	return shift + nr > boundary_size;
+	return 0;
+/* 	shift = (shift + index) & (boundary_size - 1); */
+/* 	return shift + nr > boundary_size; */
 }
 
 /* Must be called with the arena lock held */
@@ -144,7 +145,9 @@ iommu_arena_find_pages(struct device *dev, struct pci_iommu_arena *arena,
 	unsigned long base;
 	unsigned long boundary_size;
 
-	BUG_ON(arena->dma_base & ~PAGE_MASK);
+	if (arena->dma_base & ~PAGE_MASK)
+		printk("%s %d: %lx\n", __FUNCTION__, __LINE__, arena->dma_base);
+
 	base = arena->dma_base >> PAGE_SHIFT;
 	if (dev)
 		boundary_size = ALIGN(dma_get_max_seg_size(dev) + 1, PAGE_SIZE)
@@ -152,7 +155,8 @@ iommu_arena_find_pages(struct device *dev, struct pci_iommu_arena *arena,
 	else
 		boundary_size = ALIGN(1UL << 32, PAGE_SIZE) >> PAGE_SHIFT;
 
-	BUG_ON(!is_power_of_2(boundary_size));
+	if (!is_power_of_2(boundary_size))
+		printk("%s %d: %lx\n", __FUNCTION__, __LINE__, boundary_size);
 
 	/* Search forward for the first mask-aligned sequence of N free ptes */
 	ptes = arena->ptes;
--

From: Ivan Kokshaysky
Date: Friday, March 7, 2008 - 9:21 am

Unfortunately, I was not able to test these patches either, as my
alpha with iommmu and scsi is disassembled at the moment, thanks to

				      ^^^^^^^^^^^^^^^^^^^^^^^^^


Perhaps it was that, if the scsi driver set max_seg_size to something
other than default 65536 (and not a power of two)...

Bob, can you give it a try?

Ivan.

diff --git a/arch/alpha/kernel/pci_iommu.c b/arch/alpha/kernel/pci_iommu.c
index be6fa10..e07a23f 100644
--- a/arch/alpha/kernel/pci_iommu.c
+++ b/arch/alpha/kernel/pci_iommu.c
@@ -144,15 +144,14 @@ iommu_arena_find_pages(struct device *dev, struct pci_iommu_arena *arena,
 	unsigned long base;
 	unsigned long boundary_size;
 
-	BUG_ON(arena->dma_base & ~PAGE_MASK);
 	base = arena->dma_base >> PAGE_SHIFT;
-	if (dev)
-		boundary_size = ALIGN(dma_get_max_seg_size(dev) + 1, PAGE_SIZE)
-			>> PAGE_SHIFT;
-	else
-		boundary_size = ALIGN(1UL << 32, PAGE_SIZE) >> PAGE_SHIFT;
-
-	BUG_ON(!is_power_of_2(boundary_size));
+	if (dev) {
+		boundary_size = dma_get_seg_boundary(dev) + 1;
+		BUG_ON(!is_power_of_2(boundary_size));
+		boundary_size >>= PAGE_SHIFT;
+	} else {
+		boundary_size = 1UL << (32 - PAGE_SHIFT);
+	}
 
 	/* Search forward for the first mask-aligned sequence of N free ptes */
 	ptes = arena->ptes;
--

From: FUJITA Tomonori
Date: Friday, March 7, 2008 - 9:45 am

On Fri, 7 Mar 2008 19:21:16 +0300




Thanks!
--

From: Bob Tracy
Date: Friday, March 7, 2008 - 12:06 pm

In the interest of efficiency, I'll try Ivan's patch against -rc4 before
I do anything else (although I *do* have my git tree updated and ready
for abuse if it comes to that :-)).  New kernel is building: I'll be able
to test it in about five hours (when I can get back to where the machine
is).

As usual, thanks to everyone for the quick responses.

-- 
------------------------------------------------------------------------
Bob Tracy          |  "I was a beta tester for dirt.  They never did
rct@frus.com       |   get all the bugs out." - Steve McGrew on /.
------------------------------------------------------------------------
--

From: Bob Tracy
Date: Friday, March 7, 2008 - 6:24 pm

This fixes things for me.  I'm running 2.6.25-rc4 + the above patch as
I type this.

Thanks!

-- 
------------------------------------------------------------------------
Bob Tracy          |  "I was a beta tester for dirt.  They never did
rct@frus.com       |   get all the bugs out." - Steve McGrew on /.
------------------------------------------------------------------------
--

From: FUJITA Tomonori
Date: Sunday, March 9, 2008 - 3:20 am

On Fri, 7 Mar 2008 19:24:40 -0600 (CST)

Great, thanks for testing!

Ivan, can you resend the patch with a description and signed-off-by?

Thanks,
--

From: Ivan Kokshaysky
Date: Sunday, March 9, 2008 - 6:22 am

I was going to do that, but since Andrew has picked up that patch
I thought it wasn't necessary...

Anyway, resending shouldn't hurt. I assume that I can add your Acked-by.

Ivan.
--

From: FUJITA Tomonori
Date: Sunday, March 9, 2008 - 6:29 am

On Sun, 9 Mar 2008 16:22:56 +0300

Maybe, but the -mm commit mail was not sent to linux-kernel or
Linus. I just wanted to make sure that the patch will be applied to

Thanks! Of course, you can add my Acked-by.
--

From: Ivan Kokshaysky
Date: Sunday, March 9, 2008 - 8:38 am

BTW, I feel a bit sorry for not ACKing your patches in time - I wasn't able
to test them then and I was perhaps too overcautious because of that...
But they are really good.

Ivan.
--

From: FUJITA Tomonori
Date: Monday, March 10, 2008 - 7:34 am

On Sun, 9 Mar 2008 18:38:59 +0300

No problem, thanks for finding my stupid typo quickly :)

The patchset was merged instantly so it was difficult to review it in
time.
--

From: Andrew Morton
Date: Sunday, March 9, 2008 - 11:47 am

Yup, I have it in my next-batch-for-Linus.  Maybe tomorrow..
--

Previous thread: [PATCH 3/5] net: ehea: locking order correction by Daniel Walker on Thursday, March 6, 2008 - 1:00 am. (1 message)

Next thread: Re: [PATCH 2.6.24] chroot= as a new kernel parameter by Greg Schafer on Thursday, March 6, 2008 - 9:53 pm. (1 message)