Re: [xfs-masters] Re: 2.6.22-rc1-mm1

Previous thread: [BUG] (regression) AMD k6-III/450 won't boot w/2.6.22-rc1 by Bob Tracy on Tuesday, May 15, 2007 - 8:13 pm. (16 messages)

Next thread: [RFC/PATCH 1/2] powerpc: unmap_vm_area becomes unmap_kernel_range by Benjamin Herrenschmidt on Tuesday, May 15, 2007 - 8:45 pm. (2 messages)
From: Andrew Morton
Date: Tuesday, May 15, 2007 - 8:19 pm

ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.22-rc1/2.6.22-rc1-mm1/


- I found some time to look into some writeback problems in
  fs/fs-writeback.c.  The results were ugly.  There are a pile of fixes here
  but more work (mainly testing) needs to be done.

  There's some new debug code in there which could be very expensive if
  there are a lot of dirty inodes in the machine (quadratic behaviour).  If
  the machine seems to be affected by this, the debugging may be disabled with

	echo 0 > /proc/sys/fs/inode_debug

- Added an i386 early-startup development tree, as git-newsetup.patch ("H. 
  Peter Anvin" <hpa@zytor.com>)

- Brought back git-sas.patch (Darrick J.  Wong <djwong@us.ibm.com>).  It got
  lost quite some time ago.



Boilerplate:

- See the `hot-fixes' directory for any important updates to this patchset.

- To fetch an -mm tree using git, use (for example)

  git-fetch git://git.kernel.org/pub/scm/linux/kernel/git/smurf/linux-trees.git tag v2.6.16-rc2-mm1
  git-checkout -b local-v2.6.16-rc2-mm1 v2.6.16-rc2-mm1

- -mm kernel commit activity can be reviewed by subscribing to the
  mm-commits mailing list.

        echo "subscribe mm-commits" | mail majordomo@vger.kernel.org

- If you hit a bug in -mm and it is not obvious which patch caused it, it is
  most valuable if you can perform a bisection search to identify which patch
  introduced the bug.  Instructions for this process are at

        http://www.zip.com.au/~akpm/linux/patches/stuff/bisecting-mm-trees.txt

  But beware that this process takes some time (around ten rebuilds and
  reboots), so consider reporting the bug first and if we cannot immediately
  identify the faulty patch, then perform the bisection search.

- When reporting bugs, please try to Cc: the relevant maintainer and mailing
  list on any email.

- When reporting bugs in this kernel via email, please also rewrite the
  email Subject: in some manner to reflect the nature of the bug.  Some
  developers ...
From: KAMEZAWA Hiroyuki
Date: Tuesday, May 15, 2007 - 11:06 pm

On Tue, 15 May 2007 20:19:14 -0700

If CONFIG_SCSI=y && CONFIG_ATA=n, 

==
ERROR: "ata_sas_slave_configure" [drivers/scsi/libsas/libsas.ko] undefined!
ERROR: "ata_port_disable" [drivers/scsi/libsas/libsas.ko] undefined!
ERROR: "ata_sas_port_init" [drivers/scsi/libsas/libsas.ko] undefined!
ERROR: "ata_sas_port_stop" [drivers/scsi/libsas/libsas.ko] undefined!
ERROR: "ata_sas_port_start" [drivers/scsi/libsas/libsas.ko] undefined!
ERROR: "ata_sas_port_alloc" [drivers/scsi/libsas/libsas.ko] undefined!
ERROR: "ata_noop_qc_prep" [drivers/scsi/libsas/libsas.ko] undefined!
ERROR: "ata_tf_to_fis" [drivers/scsi/libsas/libsas.ko] undefined!
ERROR: "ata_noop_dev_select" [drivers/scsi/libsas/libsas.ko] undefined!
ERROR: "ata_tf_from_fis" [drivers/scsi/libsas/libsas.ko] undefined!
ERROR: "ata_host_init" [drivers/scsi/libsas/libsas.ko] undefined!
ERROR: "ata_sas_queuecmd" [drivers/scsi/libsas/libsas.ko] undefined!
ERROR: "ata_sas_port_destroy" [drivers/scsi/libsas/libsas.ko] undefined!
ERROR: "ata_scsi_ioctl" [drivers/scsi/libsas/libsas.ko] undefined!
ERROR: "ata_qc_complete" [drivers/scsi/libsas/libsas.ko] undefined!
make[1]: *** [__modpost] Error 1"
==

This error comes.

-Kame

-

From: Jeff Garzik
Date: Wednesday, May 16, 2007 - 12:58 am

Looks like SAS needs to require CONFIG_ATA...

	Jeff



-

From: Andrew Morton
Date: Wednesday, May 16, 2007 - 1:04 am

Yes, but it seems wrong to disable all of libsas if !ATA.  Only sas_ata.o
should depend on that.

Darrick, is there any point in me carrying this tree?  It doesn't appear to
be a hotbed of activity...
-

From: Jeff Garzik
Date: Wednesday, May 16, 2007 - 8:33 am

Agreed.

	Jeff


-

From: Darrick J. Wong
Date: Wednesday, May 16, 2007 - 1:24 pm

Nope.  I haven't worked on those bits of code in quite a while, since a
number of scsi/libata reorganizations were discussed at the storage
summit that would make a fair amount of the sas_ata code unnecessary (or
candidates for reworking).

--D
From: Randy Dunlap
Date: Wednesday, May 16, 2007 - 9:54 am

---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-

From: Cornelia Huck
Date: Wednesday, May 16, 2007 - 12:57 am

On Tue, 15 May 2007 20:19:14 -0700,

Doesn't build on s390 when selecting the md menu:

drivers/built-in.o(.text+0x4423e): more undefined references to
`dma_map_page' follow

This is caused by the following in drivers/md/Kconfig:

menuconfig MD
        bool "Multiple devices driver support (RAID and LVM)"
        depends on BLOCK
        select ASYNC_TX_DMA
        help
          Support multiple physical spindles through a single logical device.
          Required for RAID and logical volume management.

ASYNC_TX_DMA is defined in drivers/dma/Kconfig, which has

menu "DMA Engine support"
        depends on !S390

but unfortunately ASYNC_TX_DMA depends neither on the menu nor
on !S390. (I think it was just an unknown symbol on s390 before
Martin's Kconfig rework, so I could build older -mm kernels.)

Currently, the only md stuff depending on ASYNC_TX_DMA is MD_RAID456
(which means it doesn't work on s390 anymore, which is bad enough).
With the select statement, no md stuff can be build on s390 at all (and
I really don't see why ASYNC_TX_DMA should be forced upon all md
users)...
-

From: Williams, Dan J
Date: Wednesday, May 16, 2007 - 10:21 am

The rationale for the 'select' here was to attempt to prevent user

I agree it should not be forced on all users, I will push the following
change:

diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig
index 4a1b77e..fd29a54 100644
--- a/drivers/md/Kconfig
+++ b/drivers/md/Kconfig
@@ -8,7 +8,6 @@ menu "Multi-device support (RAID and LVM)"

 config MD
        bool "Multiple devices driver support (RAID and LVM)"
-       select ASYNC_TX_DMA
        help
          Support multiple physical spindles through a single logical
device.
          Required for RAID and logical volume management.
@@ -109,7 +108,8 @@ config MD_RAID10

 config MD_RAID456
        tristate "RAID-4/RAID-5/RAID-6 mode"
-       depends on BLK_DEV_MD && ASYNC_TX_DMA
+       depends on BLK_DEV_MD
+       select ASYNC_TX_DMA
        ---help---
          A RAID-5 set of N drives with a capacity of C MB per drive
provides
          the capacity of C * (N - 1) MB, and protects against a failure

However this still will not allow s390 to build MD_RAID456.  This
dependency is in place because the xor.o object has moved from
drivers/md to drivers/dma.  The goal of the interface is to support
using offload engines when they are present, and use software routines
(like xor_block) when engines are not available.  In other words, the
intent is that DMA_ENGINE=n && ASYNC_TX_DMA=y is a valid configuration.

Can we rework the !S390 change to the DMA_ENGINE menu?  It seems to me
that S390 should follow the ARM example and only enable the driver menus
they want in arch/s390/Kconfig, no?

...

On a closer look, it seems async_tx should be its own directory like
crypto...  I'll post the incremental changes to the md-accel git tree
for review.

Dan
-

From: Andy Whitcroft
Date: Wednesday, May 16, 2007 - 3:18 am

Getting this on both x86 and x86_64 boxes, they are the older boxen so
likely older compilers:

  CC      arch/x86_64/boot/memory.o
arch/i386/boot/memory.c: In function `detect_memory':
arch/i386/boot/memory.c:32: error: can't find a register in class `DREG'
while reloading `asm'

Seems to come from git-netsetup, but that tree isn't pulled into your
git version of -mm so I can't be more specific.

-apw
-

From: H. Peter Anvin
Date: Wednesday, May 16, 2007 - 8:16 am

Does the following patch work for you?

	-hpa

From: Mel Gorman
Date: Wednesday, May 16, 2007 - 10:40 am

With the patch, elm3b6 from test.kernel.org builds and boots. It's
x86_64. elm3b132 which is x86 fails with

  CC      arch/i386/boot/video-bios.o
  HOSTCC  arch/i386/boot/tools/build
  AS      arch/i386/boot/compressed/head.o
  CC      arch/i386/boot/compressed/misc.o
  OBJCOPY arch/i386/boot/compressed/vmlinux.bin
  LD      arch/i386/boot/setup.elf
ld:arch/i386/boot/setup.ld:47: syntax error
make[1]: *** [arch/i386/boot/setup.elf] Error 1
make[1]: *** Waiting for unfinished jobs....
  GZIP    arch/i386/boot/compressed/vmlinux.bin.gz
include/asm/processor.h: In function `native_get_debugreg':
include/asm/processor.h:531: warning: asm operand 0 probably doesn't
match constraints
include/asm/processor.h: In function `native_set_debugreg':
include/asm/processor.h:558: warning: asm operand 0 probably doesn't
match constraints
  LD      arch/i386/boot/compressed/piggy.o
  LD      arch/i386/boot/compressed/vmlinux
make: *** [bzImage] Error 2
05/16/07-17:27:44 Build the kernel. Failed rc = 2
05/16/07-17:27:44 build: kernel build Failed rc = 1
Failed and terminated the run

I haven't checked yet if that has anything to do with git-newsetup or
not.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab
-

From: H. Peter Anvin
Date: Wednesday, May 16, 2007 - 10:55 am

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

... as well as binutils version number (it appears that your version of

	-hpa
-

From: Andy Whitcroft
Date: Wednesday, May 16, 2007 - 11:18 am

SOrry, had to wait for the machine to come idle:

elm3b132:~# gcc -v
Reading specs from /usr/lib/gcc-lib/i486-linux/3.3.4/specs
Configured with: ../src/configure -v
--enable-languages=c,c++,java,f77,pascal,objc,ada,treelang --prefix=/usr
--mandir=/usr/share/man --infodir=/usr/share/info
--with-gxx-include-dir=/usr/include/c++/3.3 --enable-shared
--with-system-zlib --enable-nls --without-included-gettext
--enable-__cxa_atexit --enable-clocale=gnu --enable-debug
--enable-java-gc=boehm --enable-java-awt=xlib --enable-objc-gc i486-linux
Thread model: posix
gcc version 3.3.4 (Debian 1:3.3.4-3)
elm3b132:~# dpkg -l | grep binutil
ii  binutils       2.14.90.0.7-8  The GNU assembler, linker and binary
utiliti

-apw
-

From: Andrew Morton
Date: Wednesday, May 16, 2007 - 11:00 am

On Wed, 16 May 2007 18:40:47 +0100

	ASSERT(_end <= 0x8000, "Setup too big!")


	static inline unsigned long native_get_debugreg(int regno)
	{
		unsigned long val = 0; 	/* Damn you, gcc! */
	
		switch (regno) {
		case 0:
			asm("movl %%db0, %0" :"=r" (val)); break;
		case 1:
			asm("movl %%db1, %0" :"=r" (val)); break;
		case 2:
			asm("movl %%db2, %0" :"=r" (val)); break;
		case 3:
			asm("movl %%db3, %0" :"=r" (val)); break;
		case 6:
			asm("movl %%db6, %0" :"=r" (val)); break;
		case 7:
			asm("movl %%db7, %0" :"=r" (val)); break;
		default:
-->			BUG();
		}
		return val;
	}

weird.

There are no significant changes in processor.h relative to 2.6.22-rc1.

If the file-n-line aren't screwed up, it's disliking

#define BUG()								\
	do {								\
		asm volatile("1:\tud2\n"				\
			     ".pushsection __bug_table,\"a\"\n"		\
			     "2:\t.long 1b, %c0\n"			\
			     "\t.word %c1, 0\n"				\
			     "\t.org 2b+%c2\n"				\
			     ".popsection"				\
			     : : "i" (__FILE__), "i" (__LINE__),	\
			     "i" (sizeof(struct bug_entry)));		\
		for(;;) ;						\
	} while(0)


It built and ran 2.6.22-rc1-git4 happily.


-

From: H. Peter Anvin
Date: Wednesday, May 16, 2007 - 4:32 pm

Does this patch fix it for you?

	-hpa
From: H. Peter Anvin
Date: Wednesday, May 16, 2007 - 4:36 pm

Correction, does *this patch* do it for you?

	-hpa

From: Mel Gorman
Date: Thursday, May 17, 2007 - 2:35 am

With these two patches in combination, previously failing machines elm3b6 
(x86_64 on test.kernel.org) and a modern x86 built a kernel and booted 
correctly.

elm3b132 and elm3b132 (x86 numaq on test.kernel.org) built with these 
patches but silently fail on boot with no output via earlyprintk. 
According to test.kernel.org, this failure occurs with git-newsetup 
reverted so it is a separate problem.

Thanks

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab
-

From: Andy Whitcroft
Date: Tuesday, May 29, 2007 - 3:34 pm

Ok, I've been following up on this failure on elm3b132/3.  I moved
forward to v2.6.22-rc2-mm1 and that also fails.  I ran a bisection on
the git-newsetup patch in as in -mm and basically it came down to the
first patch, ie. any and all of this tree stops the boot.

I just tried reproducing git-newsetup boot failures with the latest
version of your tree (369f16fdd423d79640c4390915e6ab71189cb497) and that
also fails.

Fails in this context is hard boot failure after loading the kernel and
before anything is printed.  I also added a printf to the top of main()
in boot/main.c and it doesn't come out, not that I really know if that
means it got there or not.

Any suggestions how to debug this puppy?

-apw
-

From: Andy Whitcroft
Date: Friday, June 1, 2007 - 2:50 am

Thanks to Peter for all his encouragement off list.

I cannot claim to have sorted this one out, I do however understand why
my experiences and Mels did not seem consistent.  Basically I am getting
inconsistent results with different machines.

I started my debug on a machine where 2.6.22-rc2 which worked and
2.6.22-rc2+newsetup which did not.  I debugged the latter and managed to
prove that it was in fact getting all the way to the kernel
decompressor, and then crashing hard.  The gzip image in memory was
intact and yet it did not decrypt correctly, the first about 60% was
intact, the remainder was damaged.

Suspecting that this was an "uncompress in place" overlap problem I
moved the compressed kernel way up out of the way and this then booted
successfully.  Experimenting I was able to get it to boot by increasing
the overlap 'gap' from 32KB's to 64KB's.  I was able to use the same
patch to boot 2.6.22-rc2-mm1 on the same problems machines. However,
this same overlap change did not fix another similar machine (the one in
the TKO grid).

I think that my debugging says that newsetup got the compressed kernel
and decompressor into memory ok and execution passed to it normally.
But I cannot figure out where the corruption is coming from.  I tried
annotating the gzip decompressor to see if the input and output buffers
were overlapping at any time and that debug said no (unsure how reliable
that is).  And yet at some point the output image is munched up.

One last piece of information.  The decompressor also always seems to
get to the end of the input stream in exactly the right place without
reporting any kind of error, that is with exactly 8 bytes left over for
the length and crc checks.  Which given the context sensitive nature of
the algorithm tends to imply the input stream was ok for the whole
duration of the decompress.  Yet the output stream is badly broken.

Anyone got any wacky suggestions ...

-apw
-

From: H. Peter Anvin
Date: Friday, June 1, 2007 - 4:12 pm

It definitely sounds like a memory clobber of some sort.

Usual suspects, in addition to the input/output buffers you already
looked at, would be the heap and the stack.  Finding where the stack
pointer lives would be my first, instinctive guess.

	-hpa
-

From: Andy Whitcroft
Date: Tuesday, June 5, 2007 - 11:38 am

The stack seems to be where it should be and seems to stay pretty much
in the same place as it should.  Adding checks for the heap also seem to
stay within bounds.  I've tried making the stack and the heap 64k to no
effect.

Moving the kernel to other places in memory seems to kill the decode
completely during gunzip() which may be a hint I am not sure.

This thing is trying to ruin my mind.

-apw
-

From: H. Peter Anvin
Date: Tuesday, June 5, 2007 - 3:57 pm

Yours and mine both.  Seems like *something* is clobbering memory, but
what and why is a mystery.  The fact that putting the kernel in a higher
point in memory is a good indication that this clobber is at a
relatively high address.

How much RAM does this machine have?

	-hpa
-

From: Andy Whitcroft
Date: Thursday, June 7, 2007 - 2:49 am

This is as 12GB machine.  3 numa nodes.

I checked out the location of the IDT and GDT and both seem sane, in the
9xxxx range below the kernel destination.

I also note that on another machine of this type, one Node only in that
case some of the "did work" cases do not work.  Also when I applied some
of my patches on the top "working" cases stopped working.  So whatever
it is is definatly related to the shape of the kernel to be loaded.
Very confusing.

-apw
-

From: Andy Whitcroft
Date: Monday, June 11, 2007 - 6:58 am

Ok, in fact when the kernel is moved elsewhere in the address space it
will decode properly.  There was a check in there for not loading at the
right address which was catching me out ... as errors do not show up as
we have no serial support.  Doh.

Once I had gotten this decoding at other addresses I simply tried moving
the base address for the kernel elsewhere.  I am able to successfully
boot the kernel at 16MB and 256MB.  This seems like something outside
the decoder scribbling.

I would not normally recommend moving the base address of the kernel.
However, this problem at least so far has only shown up on the NUMA-Q
platform which is at best described as a very small volume
sub-architecture.  There are areas in which it differers from mainstream
BIOS and we are no longer able to get details of these differences.

We actually have no proof as yet this is or is not a NUMA-Q specific
problem.  For instance these machines tend to run less modules and more
builtin stuff than the average due to an owner dislike of modules.  So
we could have a lurking kernel size issue or similar.

I am therefore proposing change the base address for NUMA-Q only (patch
to follow this email).  And that we remain aware of the issue and on the
lookout for similar breakage on mainstream x86 platforms.  At least with
this patch we can get wider testing on the rest of the kernel.

-apw
-

From: Bharata B Rao
Date: Wednesday, May 16, 2007 - 9:16 pm

Observed same problem with gcc version 3.4.4 20050721 (Red Hat
3.4.4-2) and binutils-2.15.92.0.2-15 and the above patch fixes it.

Regards,
Bharata.
-- 
"Men come and go but mountains remain" -- Ruskin Bond.
-

From: young dave
Date: Friday, May 18, 2007 - 1:54 am

Hi,
I have the same problem, your patch fixed it.

Regards
dave
-

From: young dave
Date: Friday, May 18, 2007 - 3:07 am

Hi,

After installation the new mm1 kernel, My system can not boot, the rc1
kernel works ok.

The cursor just blinks after appearing "Bios data check successful" message.

what do you think about this?
-

From: H. Peter Anvin
Date: Friday, May 18, 2007 - 9:54 am

"Bios data check successful" is not a message that comes from Linux, nor
from the boot loader.

Since you have left absolutely zero details about your system or
anything else, there isn't much anyone can do about it.

	-hpa

-

From: Mel Gorman
Date: Friday, May 18, 2007 - 9:59 am

It sounds vagely similar to the silent failure on elm3b132. I'm still 
bisecting this on the side. It's taking an age because the target machine 
is so slow and using a faster machine with a different compiler does not 
reproduce the problem. I don't think it's git-newsetup that is the problem 
though for what that's worth.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab
-

From: young dave
Date: Sunday, May 20, 2007 - 5:53 pm

Hi,
My cpu is Intel(R) Pentium(R) D CPU 2.80GHz, below are the lspci
output and kernel

-------lspci-----------
00:00.0 Host bridge: Intel Corporation 945G/GZ/P/PL Express Memory
Controller Hub (rev 02)
00:01.0 PCI bridge: Intel Corporation 945G/GZ/P/PL Express PCI Express
Root Port (rev 02)
00:1b.0 Audio device: Intel Corporation 82801G (ICH7 Family) High
Definition Audio Controller (rev 01)
00:1c.0 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express
Port 1 (rev 01)
00:1d.0 USB Controller: Intel Corporation 82801G (ICH7 Family) USB
UHCI #1 (rev 01)
00:1d.1 USB Controller: Intel Corporation 82801G (ICH7 Family) USB
UHCI #2 (rev 01)
00:1d.2 USB Controller: Intel Corporation 82801G (ICH7 Family) USB
UHCI #3 (rev 01)
00:1d.3 USB Controller: Intel Corporation 82801G (ICH7 Family) USB
UHCI #4 (rev 01)
00:1d.7 USB Controller: Intel Corporation 82801G (ICH7 Family) USB2
EHCI Controller (rev 01)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev e1)
00:1f.0 ISA bridge: Intel Corporation 82801GB/GR (ICH7 Family) LPC
Interface Bridge (rev 01)
00:1f.1 IDE interface: Intel Corporation 82801G (ICH7 Family) IDE
Controller (rev 01)
00:1f.2 IDE interface: Intel Corporation 82801GB/GR/GH (ICH7 Family)
Serial ATA Storage Controller IDE (rev 01)
00:1f.3 SMBus: Intel Corporation 82801G (ICH7 Family) SMBus Controller (rev 01)
01:00.0 VGA compatible controller: ATI Technologies Inc RV380 [Radeon
X600 (PCIE)]
01:00.1 Display controller: ATI Technologies Inc RV380 [Radeon X600]
03:08.0 Ethernet controller: Intel Corporation 82801G (ICH7 Family)
LAN Controller (rev 01)

------------config-------------

#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.22-rc1-mm1
# Fri May 18 15:55:20 ...
From: H. Peter Anvin
Date: Sunday, May 20, 2007 - 9:49 pm

Could you please try booting with "vga=ask", and see if you get the
video mode selection menu?

	-hpa
-

From: young dave
Date: Sunday, May 20, 2007 - 10:00 pm

Hi,

I tried the vga option , and the selection menu appeared, then I
select 0(80x25) and nothing happened.

-

From: H. Peter Anvin
Date: Sunday, May 20, 2007 - 10:03 pm

OK.

Could you put printf's in the setup code (especially
arch/i386/boot/main.c) to see how far it runs before it dies?

	-hpa
-

From: young dave
Date: Sunday, May 20, 2007 - 10:39 pm

I add some debug info to main.c, the result is that the kernel stopped
in query_edd();

Then I use kernel argument edd=off, the kernel booted happilly.

I will read the edd.c to see what happened. do you have some suggestion?
-

From: Jiri Slaby
Date: Wednesday, May 16, 2007 - 5:10 am

I've got this in dmesg:

BUG: at /local/xslaby/xxx/mm/page-writeback.c:829 __set_page_dirty_nobuffers()
 [<c010531e>] dump_trace+0x1ce/0x200
 [<c010536a>] show_trace_log_lvl+0x1a/0x30
 [<c0106012>] show_trace+0x12/0x20
 [<c0106086>] dump_stack+0x16/0x20
 [<c015566d>] __set_page_dirty_nobuffers+0x11d/0x130
 [<c0155690>] redirty_page_for_writepage+0x10/0x20
 [<c01938fc>] __block_write_full_page+0x20c/0x330
 [<c0193b0a>] block_write_full_page+0xea/0x100
 [<c0196c82>] blkdev_writepage+0x12/0x20
 [<c015539e>] __writepage+0xe/0x30
 [<c01558c2>] write_cache_pages+0x222/0x340
 [<c0155a03>] generic_writepages+0x23/0x30
 [<c0155a3e>] do_writepages+0x2e/0x50
 [<c018decb>] __writeback_single_inode+0x8b/0x470
 [<c018e75b>] generic_sync_sb_inodes+0x24b/0x470
 [<c018e9a7>] sync_sb_inodes+0x27/0x30
 [<c018ec33>] writeback_inodes+0xb3/0xe0
 [<c01560f2>] wb_kupdate+0x82/0xf0
 [<c015660b>] pdflush+0xeb/0x1b0
 [<c0132e72>] kthread+0x42/0x70
 [<c0104d4b>] kernel_thread_helper+0x7/0x1c
 =======================
BUG: at /local/xslaby/xxx/mm/page-writeback.c:829 __set_page_dirty_nobuffers()
 [<c010531e>] dump_trace+0x1ce/0x200
 [<c010536a>] show_trace_log_lvl+0x1a/0x30
 [<c0106012>] show_trace+0x12/0x20
 [<c0106086>] dump_stack+0x16/0x20
 [<c015566d>] __set_page_dirty_nobuffers+0x11d/0x130
 [<f8b1fc5b>] nfs_updatepage+0x7b/0x200 [nfs]
 [<f8b156df>] nfs_commit_write+0x2f/0x50 [nfs]
 [<c0150911>] generic_file_buffered_write+0x2a1/0x660
 [<c0150f52>] __generic_file_aio_write_nolock+0x282/0x520
 [<c0151252>] generic_file_aio_write+0x62/0xd0
 [<f8b15def>] nfs_file_write+0xef/0x1c0 [nfs]
 [<c01715e0>] do_sync_write+0xd0/0x110
 [<c0171e04>] vfs_write+0x94/0x130
 [<c017248d>] sys_write+0x3d/0x70
 [<c01040e8>] syscall_call+0x7/0xb
 [<b7eb7b3e>] 0xb7eb7b3e
 =======================

regards,
-- 
http://www.fi.muni.cz/~xslaby/            Jiri Slaby
faculty of informatics, masaryk university, brno, cz
e-mail: jirislaby gmail com, gpg pubkey fingerprint:
 B674 9967 0407 CE62 ACC8  22A0 32CC 55C3 ...
From: Nick Piggin
Date: Wednesday, May 16, 2007 - 5:39 am

Do you have any messages before this one? Seems like it is probably metadata,

This one is NFS, setting the page dirty while it is not uptodate. Trond,
is this because NFS keeps track of dirty regions of the page with private
data? It might make sense to avoid this warning if PagePrivate is set...
would that fix the NFS case?

-- 
SUSE Labs, Novell Inc.
-

From: Jiri Slaby
Date: Wednesday, May 16, 2007 - 5:44 am

No other messages before that. Bazillion through-nfs stacks after this...

regards,
-- 
http://www.fi.muni.cz/~xslaby/            Jiri Slaby
faculty of informatics, masaryk university, brno, cz
e-mail: jirislaby gmail com, gpg pubkey fingerprint:
 B674 9967 0407 CE62 ACC8  22A0 32CC 55C3 39D4 7A7E
-

From: Nick Piggin
Date: Wednesday, May 16, 2007 - 5:47 am

Does this patch fix NFS?

-- 
SUSE Labs, Novell Inc.
From: Trond Myklebust
Date: Wednesday, May 16, 2007 - 6:00 am

Ah... You put an extra WARN_ON(!PageUptodate(page)). err=-ENOCOFFEE, I
missed that...


So yes, in order to avoid having to read the page in when we just want
to write data, NFS does this kind of tracking. I dunno if your fix to
change it to !PagePrivate(page) && !PageUptodate(page) is right though.
It will indeed fix the NFS case, but the block system uses PagePrivate()
pretty extensively for its own nefarious ends (tracking page buffers).

Trond

-

From: Nick Piggin
Date: Wednesday, May 16, 2007 - 6:06 am

I think that's OK: the block layer is similarly happy to mark a !uptodate
page dirty if it has buffers, for similar reasons... Anyway, it won't use
this particular path when buffers are attached, and I've put similar
debugging stuff in the set_page_dirty_buffers part.

-- 
SUSE Labs, Novell Inc.
-

From: Trond Myklebust
Date: Wednesday, May 16, 2007 - 5:52 am

The first Oops is not NFS: it is some block file system, however the
problem is the same. The crux of the matter would appear to be that some
task is changing the page_mapping() of random pages while the page lock
is held by another task.

Do you see the same thing in mainline?

Trond

-

From: Michal Piotrowski
Date: Wednesday, May 16, 2007 - 7:30 am

This might be related

[   97.740021] BUG: at /home/devel/linux-mm/mm/page-writeback.c:829 __set_page_dirty_nobuffers()
[   97.748632]  [<c0105276>] dump_trace+0x63/0x1eb
[   97.753275]  [<c0105418>] show_trace_log_lvl+0x1a/0x30
[   97.758521]  [<c010605a>] show_trace+0x12/0x14
[   97.763042]  [<c01060f7>] dump_stack+0x16/0x18
[   97.767590]  [<c01677b3>] __set_page_dirty_nobuffers+0xfe/0x16e
[   97.773598]  [<c0167833>] redirty_page_for_writepage+0x10/0x12
[   97.779491]  [<c01a473a>] __block_write_full_page+0x1dc/0x335
[   97.785328]  [<c01a495c>] block_write_full_page+0xc9/0xd1
[   97.790799]  [<c01a781a>] blkdev_writepage+0x12/0x14
[   97.795829]  [<c01674ea>] __writepage+0xe/0x29
[   97.800350]  [<c01679b8>] write_cache_pages+0x183/0x29a
[   97.805683]  [<c0167af1>] generic_writepages+0x22/0x2a
[   97.810929]  [<c0167b1c>] do_writepages+0x23/0x34
[   97.815702]  [<c019f0a3>] __writeback_single_inode+0x245/0x472
[   97.821632]  [<c019f7e6>] generic_sync_sb_inodes+0x347/0x4cc
[   97.827379]  [<c019f98b>] sync_sb_inodes+0x20/0x24
[   97.832247]  [<c019fb93>] writeback_inodes+0x79/0xc2
[   97.837296]  [<c0168173>] wb_kupdate+0x7a/0xdb
[   97.841833]  [<c01686a0>] pdflush+0xf1/0x189
[   97.846173]  [<c0137d41>] kthread+0x3b/0x62
[   97.850461]  [<c0104e3f>] kernel_thread_helper+0x7/0x10

http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.22-rc1-mm1/mm-dmesg
http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.22-rc1-mm1/mm-config

Regards,
Michal

-- 
Michal K. K. Piotrowski
Kernel Monkeys
(http://kernel.wikidot.com/start)
-

From: Nick Piggin
Date: Wednesday, May 16, 2007 - 7:37 am

Michal Piotrowski wrote:
> Andrew Morton napisa
From: Gabriel C
Date: Wednesday, May 16, 2007 - 8:34 am

This one as well I guess :

[14649.407909] BUG: at mm/page-writeback.c:829 __set_page_dirty_nobuffers()
[14649.407945]  [<c0156bb3>] __set_page_dirty_nobuffers+0x9a/0x104
[14649.407976]  [<c018cfea>] __block_write_full_page+0x1b7/0x2f1
[14649.407999]  [<e8ba89b4>] ext3_get_block+0x0/0xd0 [ext3]
[14649.408039]  [<c018d1f2>] block_write_full_page+0xce/0xd6
[14649.408054]  [<e8ba743a>] walk_page_buffers+0x4d/0x67 [ext3]
[14649.408072]  [<e8ba89b4>] ext3_get_block+0x0/0xd0 [ext3]
[14649.408096]  [<e8ba9f52>] ext3_ordered_writepage+0xdc/0x189 [ext3]
[14649.408115]  [<e8ba7454>] bget_one+0x0/0x7 [ext3]
[14649.408142]  [<c01569cf>] __writepage+0xb/0x26
[14649.408153]  [<c0156d88>] write_cache_pages+0x161/0x274
[14649.408166]  [<c01569c4>] __writepage+0x0/0x26
[14649.408187]  [<e8beaa03>] rtl8139_interrupt+0x3cd/0x3d7 [8139too]
[14649.408217]  [<c01c97c3>] __next_cpu+0x15/0x26
[14649.408229]  [<c011b561>] find_busiest_group+0x1c9/0x54a
[14649.408251]  [<c0156eba>] generic_writepages+0x1f/0x27
[14649.408263]  [<c0156eee>] do_writepages+0x2c/0x34
[14649.408275]  [<c018845d>] __writeback_single_inode+0x1c3/0x3aa
[14649.408295]  [<c0188235>] __check_dirty_inode_list+0x21/0x86
[14649.408321]  [<c0188a46>] generic_sync_sb_inodes+0x267/0x3a8
[14649.408347]  [<c0188f49>] writeback_inodes+0x63/0xaa
[14649.408355]  [<c0132db8>] autoremove_wake_function+0x0/0x35
[14649.408368]  [<c015777a>] pdflush+0x0/0x1a3
[14649.408377]  [<c01574c1>] wb_kupdate+0x7f/0xe3
[14649.408410]  [<c0157887>] pdflush+0x10d/0x1a3
[14649.408425]  [<c0157442>] wb_kupdate+0x0/0xe3
[14649.408440]  [<c0132ce6>] kthread+0x3b/0x61
[14649.408447]  [<c0132cab>] kthread+0x0/0x61
[14649.408455]  [<c0104a27>] kernel_thread_helper+0x7/0x10
[14649.408473]  =======================
[24270.804919] BUG: at mm/page-writeback.c:829 __set_page_dirty_nobuffers()
[24270.804955]  [<c0156bb3>] __set_page_dirty_nobuffers+0x9a/0x104
[24270.804986]  [<c018cfea>] __block_write_full_page+0x1b7/0x2f1
[24270.805014]  [<e8ba89b4>] ...
From: Michal Piotrowski
Date: Wednesday, May 16, 2007 - 9:24 am

Almost every time when I try to run this script I hit a bug. I'm wondering why...
http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.22-rc1-mm1/test_mount_fs.sh

[ 6666.713016] kernel BUG at /home/devel/linux-mm/include/linux/mm.h:288!
[ 6666.719690] invalid opcode: 0000 [#1]
[ 6666.723397] PREEMPT SMP
[ 6666.725999] Modules linked in: xfs loop pktgen ipt_MASQUERADE iptable_nat nf_nat autofs4 af_packet nf_conntrack_netbios_ns ipt_REJECT nf_conntrack_ipv4 xt_state nf_conntrack nfnetlink iptable_filter ip_tables ip6t_REJECT xt_tcpudp ip6table_filter ip6_tables x_tables ipv6 binfmt_misc thermal processor fan container nvram snd_intel8x0 snd_ac97_codec ac97_bus snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm evdev snd_timer snd soundcore intel_agp agpgart snd_page_alloc i2c_i801 ide_cd cdrom rtc unix
[ 6666.776026] CPU:    0
[ 6666.776027] EIP:    0060:[<c01693ec>]    Not tainted VLI
[ 6666.776028] EFLAGS: 00010202   (2.6.22-rc1-mm1 #3)
[ 6666.788519] EIP is at put_page+0x44/0xee
[ 6666.792491] eax: 00000001   ebx: c549f728   ecx: c04b27e0   edx: 00000001
[ 6666.799345] esi: 00000000   edi: 00000080   ebp: d067e9e0   esp: d067e9c8
[ 6666.806208] ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
[ 6666.812104] Process mount (pid: 9419, ti=d067e000 task=d00a4070 task.ti=d067e000)
[ 6666.819486] Stack: d8980180 00000080 d067e9f0 d8980180 00000000 00000080 d067e9f0 fdc8eda3
[ 6666.828103]        fffffffc d8980180 d067ea20 fdc8f7ff fdc9b425 fdc96e5c 00080000 00000000
[ 6666.836635]        c549dfd0 00000200 ffffffff cd44b8e0 00002160 cd44b8e0 d067ea30 fdc78937
[ 6666.845253] Call Trace:
[ 6666.847939]  [<fdc8eda3>] xfs_buf_free+0x41/0x61 [xfs]
[ 6666.853247]  [<fdc8f7ff>] xfs_buf_get_noaddr+0x10c/0x118 [xfs]
[ 6666.859231]  [<fdc78937>] xlog_get_bp+0x65/0x69 [xfs]
[ 6666.864412]  [<fdc79e87>] xlog_write_log_records+0x73/0x20d [xfs]
[ 6666.870654]  [<fdc7a174>] xlog_clear_stale_blocks+0x153/0x15b [xfs]
[ 6666.877075]  [<fdc7a546>] ...
From: Andrew Morton
Date: Wednesday, May 16, 2007 - 9:41 am

static inline int put_page_testzero(struct page *page)
{
	VM_BUG_ON(atomic_read(&page->_count) == 0);
	return atomic_dec_and_test(&page->_count);

Looks like XFS did a free of an already-freed page.  There are a couple of
likely suspects in git-xfs.patch.

Does mainline do this?

-

From: David Chinner
Date: Wednesday, May 16, 2007 - 7:06 pm

I haven't seen that one. I expect that it will be the noaddr buffer allocation

Yeah - that trace implies a memory allocation failure when allocating
log buffer pages and the cleanup looks like it does a double free
of the pages that got allocated. Patch attached below that should fix

I assume that the thread doing the mount got killed by the BUG and so the
normal error handling path on log mount failure was not executed and hence the
uuid for the filesystem never got removed from the table used to detect
multiple mounts of the same filesystem....

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

---
 fs/xfs/linux-2.6/xfs_buf.c |   21 +++++++++++++--------
 1 file changed, 13 insertions(+), 8 deletions(-)

Index: 2.6.x-xfs-new/fs/xfs/linux-2.6/xfs_buf.c
===================================================================
--- 2.6.x-xfs-new.orig/fs/xfs/linux-2.6/xfs_buf.c	2007-05-11 16:03:26.000000000 +1000
+++ 2.6.x-xfs-new/fs/xfs/linux-2.6/xfs_buf.c	2007-05-17 11:53:40.293585132 +1000
@@ -323,9 +323,16 @@ xfs_buf_free(
 		for (i = 0; i < bp->b_page_count; i++) {
 			struct page	*page = bp->b_pages[i];
 
-			if (bp->b_flags & _XBF_PAGE_CACHE)
+			/* handle noaddr allocation failure case */
+			if (!page)
+				break;
+
+			if (bp->b_flags & _XBF_PAGE_CACHE) {
 				ASSERT(!PagePrivate(page));
-			page_cache_release(page);
+				page_cache_release(page);
+			} else {
+				__free_page(page);
+			}
 		}
 		_xfs_buf_free_pages(bp);
 	}
@@ -766,6 +773,8 @@ xfs_buf_get_noaddr(
 		goto fail;
 	_xfs_buf_initialize(bp, target, 0, len, 0);
 
+	bp->b_flags |= _XBF_PAGES;
+
 	error = _xfs_buf_get_pages(bp, page_count, 0);
 	if (error)
 		goto fail_free_buf;
@@ -773,15 +782,14 @@ xfs_buf_get_noaddr(
 	for (i = 0; i < page_count; i++) {
 		bp->b_pages[i] = alloc_page(GFP_KERNEL);
 		if (!bp->b_pages[i])
-			goto fail_free_mem;
+			goto fail_free_buf;
 	}
-	bp->b_flags |= _XBF_PAGES;
 
 	error = _xfs_buf_map_pages(bp, ...
From: Christoph Hellwig
Date: Thursday, May 17, 2007 - 1:41 am

Yes.   xfs_buf_get_noaddr calls xfs_buf_free to free a buffer when
something fails.  But this is wrong - we want to call xfs_buf_deallocate
before we setup the page list, and if a page allocation fails we want to
do out own freeing of just the pages we allocated and call
_xfs_buf_free_pages.  Currently we do our own freeing _and_ call
xfs_buf_free which leads to this double free.


Signed-off-by: Christoph Hellwig <hch@lst.de>


Index: linux-2.6/fs/xfs/linux-2.6/xfs_buf.c
===================================================================
--- linux-2.6.orig/fs/xfs/linux-2.6/xfs_buf.c	2007-05-17 09:34:44.000000000 +0200
+++ linux-2.6/fs/xfs/linux-2.6/xfs_buf.c	2007-05-17 09:36:53.000000000 +0200
@@ -792,8 +792,9 @@ xfs_buf_get_noaddr(
  fail_free_mem:
  	while (--i >= 0)
 		__free_page(bp->b_pages[i]);
+	_xfs_buf_free_pages(bp);
  fail_free_buf:
-	xfs_buf_free(bp);
+	xfs_buf_deallocate(bp);
  fail:
 	return NULL;
---end quoted text---
-

From: Michal Piotrowski
Date: Thursday, May 17, 2007 - 1:05 pm

Hi Christoph,


I applied your patch and I get another oops

[  261.491499] XFS mounting filesystem loop0
[  261.501641] Ending clean XFS mount for filesystem: loop0
[  261.507698] SELinux: initialized (dev loop0, type xfs), uses xattr
[  261.567441] XFS mounting filesystem loop0
[  261.573931] allocation failed: out of vmalloc space - use vmalloc=<size> to increase size.
[  261.582935] xfs_buf_get_noaddr: failed to map pages
[  261.592478] Ending clean XFS mount for filesystem: loop0
[  261.618543] SELinux: initialized (dev loop0, type xfs), uses xattr
[  261.691563] XFS mounting filesystem loop0
[  261.698927] allocation failed: out of vmalloc space - use vmalloc=<size> to increase size.
                                  ^^^^^^^^^^^^^^^^^^^^
                                  interesting

[  261.724829] xfs_buf_get_noaddr: failed to map pages
[  261.734049] Ending clean XFS mount for filesystem: loop0
[  261.741069] SELinux: initialized (dev loop0, type xfs), uses xattr
[  261.978728] XFS mounting filesystem loop0
[  262.205863] xfs_buf_get_noaddr: failed to map pages
[  262.212523] Ending clean XFS mount for filesystem: loop0
[  262.218084] SELinux: initialized (dev loop0, type xfs), uses xattr
[..]
[  265.842566] xfs_buf_get_noaddr: failed to map pages
[  265.848267] xfs_buf_get_noaddr: failed to map pages
[  265.856480] Ending clean XFS mount for filesystem: loop0
[  265.862260] SELinux: initialized (dev loop0, type xfs), uses xattr
[  265.921288] XFS mounting filesystem loop0
[  265.927123] xfs_buf_get_noaddr: failed to map pages
[  265.932575] BUG: unable to handle kernel NULL pointer dereference at virtual address 00000000
[  265.942886]  printing eip:
[  265.945665] fdc8e82a
[  265.948818] *pde = 00000000
[  265.952378] Oops: 0002 [#1]
[  265.955241] PREEMPT SMP 
[  265.957868] Modules linked in: xfs loop ipt_MASQUERADE iptable_nat nf_nat autofs4 af_packet nf_conntrack_netbios_ns ipt_REJECT nf_conntrack_ipv4 xt_state nf_conntrack nfnetlink iptable_filter ip_tables ...
From: David Chinner
Date: Thursday, May 17, 2007 - 7:11 pm

Yeah, looks like a vmalloc leak is occurring. I haven't noticed
it before because:

VmallocTotal: 137427898368 kB
VmallocUsed:   3128272 kB
VmallocChunk: 137424770048 kB

It takes a long time to leak enough vmapped space to run out on ia64...

That tends to imply we have a mapped buffer being leaked somewhere.
Interestingly, I don't see a memory leak so we must be freeing the
memory associated with the buffer, just not unmapping it first. Not
sure how that can happen yet.....

mount xfsVmallocUsed:    177808 kB
unmount xfs
mount xfsVmallocUsed:    178080 kB
unmount xfs
mount xfsVmallocUsed:    178352 kB
unmount xfs
mount xfsVmallocUsed:    178624 kB
unmount xfs
mount xfsVmallocUsed:    178896 kB
unmount xfs
mount xfsVmallocUsed:    179168 kB
unmount xfs
mount xfsVmallocUsed:    179440 kB
unmount xfs
mount xfsVmallocUsed:    179712 kB
unmount xfs
mount xfsVmallocUsed:    179984 kB

Looks like we're leaking 272kB of vmalloc space on each mount/unmount

Groan - ASSERT(0) is the error handling there for debug kernels. If we fail
to allocate an iclogbuf on a non-debug kernel, it will panic like this.

I'll deal with that later....

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-

From: David Chinner
Date: Monday, May 21, 2007 - 3:11 am

I've found what is going on here - kmem_alloc() is decidedly more
forgiving than manually built page arrays and vmap/vunmap. Prior
to this change we wouldn't have even leaked memory....

Christoph - this is an interaction with xfs_buf_associate_memory();
I'm not sure what it is doing is at all safe now that it never gets
passed kmem_alloc()d memory - it works for the log recovery case
because we use it in pairs - once to shorten the buffer and then once
to put it back the way it was.

But that doesn't work for the log buffers (we never return them to their
original state) and the log wrap case looks to work mostly by accident
now (and could posibly lead to double freeing pages)....

It seems that what we really need with the new code is a xfs_buf_clone()
operation followed by trimming the range to what the secondary I/O needs
to span. This would work for the log buffer case as well. Your thoughts?

In the meantime, the following patch appears to fix the leak.

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

---
 fs/xfs/xfs_log.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: 2.6.x-xfs-new/fs/xfs/xfs_log.c
===================================================================
--- 2.6.x-xfs-new.orig/fs/xfs/xfs_log.c	2007-05-21 19:51:18.000000000 +1000
+++ 2.6.x-xfs-new/fs/xfs/xfs_log.c	2007-05-21 19:57:30.960084657 +1000
@@ -1457,7 +1457,7 @@ xlog_sync(xlog_t		*log,
 	} else {
 		iclog->ic_bwritecnt = 1;
 	}
-	XFS_BUF_SET_PTR(bp, (xfs_caddr_t) &(iclog->ic_header), count);
+	XFS_BUF_SET_COUNT(bp, count);
 	XFS_BUF_SET_FSPRIVATE(bp, iclog);	/* save for later */
 	XFS_BUF_ZEROFLAGS(bp);
 	XFS_BUF_BUSY(bp);
-

From: Christoph Hellwig
Date: Monday, May 21, 2007 - 3:23 am

xfs_buf_associate_memory is a mess.  My original plan was to get rid of
it, but I kept that out to keep that patchset small and easily reviable,
but it seems like that was a mistake.  My plan is the following:

 - xlog_bread and thus the whole buffer I/O path grows an iooffset
   paramater that specifies at which offset into the buffer we start
   the actual I/O.  That gets rid of all the xfs_buf_associate_memory
   memory uses in the log recovery code
 - add a buffer clone operation as suggested by you above, and use
   the offset in xlog_sync aswell.

until then you patch below looks fine.
   
-

From: David Chinner
Date: Tuesday, May 22, 2007 - 3:44 am

Perhaps a new field in the xfs_buf structure - that way call paths
don't need to grow extra parameters and potentially increase
stack usage. The read path tends to be at the top of the stack

I don't want to have to introduce a mempool just for one xfs_buf per
filesystem, so this would need to be able to take a xfs_buf (log->l_xbuf)
that it clones to.... 

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-

From: Christoph Hellwig
Date: Tuesday, May 22, 2007 - 4:42 am

I have some patches to unwind the buffer I/O path, it's a little

Yes.  Note that we currently do a non-mempooled allocated for the page
array, which this would cure aswell.
-

From: Nathan Scott
Date: Tuesday, May 22, 2007 - 4:23 pm

Thatd be unfortunate - there are very few iclog buffers relative to
every other metadata buffer, so growing the struct for all of those
too would not be ideal (I remember Steve going on pagebuf shrinking
exercises in the distant past, to fit more of em in memory at once,
I can't remember what benchmark in particular he was using though).

cheers.

-- 
Nathan

-

From: Michal Piotrowski
Date: Tuesday, May 22, 2007 - 7:45 am

Hi David,


After a few minutes of mount/umount cycle everything seems to be ok,
problem fixed.


Regards,
Michal

-- 
Michal K. K. Piotrowski
Kernel Monkeys
(http://kernel.wikidot.com/start)
-

From: Randy Dunlap
Date: Wednesday, May 16, 2007 - 9:50 am

LZO build fails on allyesconfig:

lib/built-in.o: In function `lzo1x_1_compress':
lib/lzo/minilzo.c:724: multiple definition of `lzo1x_1_compress'                           fs/built-in.o:fs/reiser4/plugin/compress/minilzo.c:1307: first defined here
ld: Warning: size of symbol `lzo1x_1_compress' changed from 1541 in fs/built-in.o to 244 in lib/built-in.o
lib/built-in.o: In function `lzo1x_decompress':                                            lib/lzo/minilzo.c:885: multiple definition of `lzo1x_decompress'
fs/built-in.o:fs/reiser4/plugin/compress/minilzo.c:1466: first defined here                ld: Warning: size of symbol `lzo1x_decompress' changed from 1047 in fs/built-in.o to 678 in lib/built-in.o
make: *** [.tmp_vmlinux1] Error 1
make: Target `all' not remade because of errors.

---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-

From: Richard Purdie
Date: Wednesday, May 16, 2007 - 10:00 am

Looks like reiser4 contains a copy of minilzo used as some kind of
compression plugin. It can be dropped in favour of the version in
lib/lzo/, they'll be compatible.

Andrew: Do you want a patch to remove it from reiser4?

Richard

-

From: Andrew Morton
Date: Wednesday, May 16, 2007 - 10:06 am

yes please.
-

From: Richard Purdie
Date: Wednesday, May 16, 2007 - 12:55 pm

Convert Reiser4 to use lzo implementation in lib/lzo/ instead of
including its own copy of minilzo.

Signed-off-by: Richard Purdie <rpurdie@openedhand.com>

---
[I've removed the deletion of minilzo.* and lzoconf.h from the LKML
version of this mail since its not very interesting]

 fs/reiser4/Kconfig                    |    1 
 fs/reiser4/Makefile                   |    1 
 fs/reiser4/plugin/compress/Makefile   |    1 
 fs/reiser4/plugin/compress/compress.c |   22 
 fs/reiser4/plugin/compress/lzoconf.h  |  216 ---
 fs/reiser4/plugin/compress/minilzo.c  | 1967 ----------------------------------
 fs/reiser4/plugin/compress/minilzo.h  |   70 -
 7 files changed, 10 insertions(+), 2268 deletions(-)

Index: linux-2.6.21/fs/reiser4/Kconfig
===================================================================
--- linux-2.6.21.orig/fs/reiser4/Kconfig	2007-05-16 18:46:01.000000000 +0100
+++ linux-2.6.21/fs/reiser4/Kconfig	2007-05-16 18:49:09.000000000 +0100
@@ -3,6 +3,7 @@ config REISER4_FS
 	depends on EXPERIMENTAL
 	select ZLIB_INFLATE
 	select ZLIB_DEFLATE
+	select LZO
 	select CRYPTO
 	help
 	  Reiser4 is a filesystem that performs all filesystem operations
Index: linux-2.6.21/fs/reiser4/Makefile
===================================================================
--- linux-2.6.21.orig/fs/reiser4/Makefile	2007-05-16 18:46:01.000000000 +0100
+++ linux-2.6.21/fs/reiser4/Makefile	2007-05-16 20:35:48.000000000 +0100
@@ -70,7 +70,6 @@ reiser4-y := \
 		   plugin/crypto/cipher.o \
 		   plugin/crypto/digest.o \
            \
-		   plugin/compress/minilzo.o \
 		   plugin/compress/compress.o \
 		   plugin/compress/compress_mode.o \
            \
Index: linux-2.6.21/fs/reiser4/plugin/compress/Makefile
===================================================================
--- linux-2.6.21.orig/fs/reiser4/plugin/compress/Makefile	2007-05-16 18:46:01.000000000 +0100
+++ linux-2.6.21/fs/reiser4/plugin/compress/Makefile	2007-05-16 18:48:42.000000000 +0100
@@ -2,5 +2,4 @@ ...
From: Richard Purdie
Date: Wednesday, May 16, 2007 - 1:00 pm

Sent.

I also noticed that reiser4 is using lzo1x_decompress(), not
lzo1x_decompress_safe(). The unsafe version is open to buffer overflows
through malicious data since it performs no validation of where it
writes output to. I'm not sure whether thats acceptable in filesystem
code, I'd suspect not?

Fixing it is a case of s/lzo1x_decompress(/lzo1x_decompress_safe(/ in 
fs/reiser4/plugin/compress/compress.c...

Richard


-

From: Edward Shishkin
Date: Friday, May 18, 2007 - 10:34 am

Ok, we will consider safe decompression,
moreover, as I remember, it doesn't lead to
sensible performance drop..

Thanks for this point,

-

From: Maciej Rutecki
Date: Wednesday, May 16, 2007 - 10:37 am

In 2.6.20.9 I can change trippoints:

echo "105:100:100:78:70:40:30" > /proc/acpi/thermal_zone/TZ0/trip_points
echo 10  > /proc/acpi/thermal_zone/TZ0/polling_frequency

Then I got:
cat /proc/acpi/thermal_zone/TZ0/*
<setting not supported>
cooling mode:   active
polling frequency:       10 seconds
state:                   active[2]
temperature:             45 C
critical (S5):           105 C
active[0]:               78 C: devices=3D0xdf415a40
active[1]:               70 C: devices=3D0xdf4159dc
active[2]:               40 C: devices=3D0xdf41598c
active[3]:               30 C: devices=3D0xdf41593c

cat /proc/acpi/fan/*/*
status:                  off
status:                  off
status:                  on
status:                  on

And fan turns on.

In 2.6.22-rc1-mm1:
echo "105:100:100:78:70:40:30" > /proc/acpi/thermal_zone/TZ0/trip_points
bash: echo: write error: B=C4=B9=C2=82=C3=84=C2=85d wej=C4=B9=C2=9Bcia/wy=
j=C4=B9=C2=9Bcia (input/output error)

rutek:/home/maciek# cat /proc/acpi/thermal_zone/TZ0/*
<setting not supported>
polling frequency:       10 seconds
state:                   ok
temperature:             45 C
critical (S5):           256 C
active[0]:               78 C: devices=3D0xc1827a40
active[1]:               70 C: devices=3D0xc18279dc
active[2]:               60 C: devices=3D0xc182798c
active[3]:               50 C: devices=3D0xc182793c
rutek:/home/maciek# cat /proc/acpi/fan/*/*
status:                  off
status:                  off
status:                  off
status:                  off

Fan turns on when temperature is over 50*C. (want: 30)

A read this:
http://article.gmane.org/gmane.linux.acpi.devel/22750

But I don't have colling_policy, but only colling_mode:
ls /proc/acpi/thermal_zone/TZ0/
cooling_mode  polling_frequency  state  temperature  trip_points

Its bug or feature?

Config, acpidump, dmesg:
http://www.unixy.pl/maciek/download/kernel/2.6.22-rc1-mm1/

--=20
Maciej Rutecki
www.unixy.pl
Kernel ...
From: Chuck Ebbert
Date: Wednesday, May 16, 2007 - 10:47 am

Committed to mainline May 10:

Gitweb:     http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=11ccc0...
Commit:     11ccc0f249cb01a129f54760b8ff087f242935d4
Parent:     de46c33745f5e2ad594c72f2cf5f490861b16ce1
Author:     Len Brown <len.brown@intel.com>
AuthorDate: Mon Apr 30 22:36:01 2007 -0400
Committer:  Len Brown <len.brown@intel.com>
CommitDate: Mon Apr 30 22:36:01 2007 -0400

    ACPI: thermal trip points are read-only
-

From: Goulven Guillard
Date: Wednesday, May 16, 2007 - 11:10 am

Should one understand that it IS a wanted behaviour ?

Isn't it the DSDT job (which is kernel-accessible, or isn't it ?) to
communicate trip_points to ACPI thermal zone ?

Isn't OSPM managing thermal zone ?

(http://acpi.sourceforge.net/documentation/thermal.html)




PS : Sorry for all these (maybe stupid) questions, but I think I
remember that changing trip_points had an effect on a (DSDT-bugged)
laptop I used to use, and I'd like to understand...

PPS : Sorry also for the english mistakes or approximations...




-- 
    ~~
   |Oo|   La banquise fond !!! Adoptez un pingouin...
  /|\/|\
   |__|            => http://doc.ubuntu-fr.org/
   ^__^
~~~|  |~~~








-

From: Pavel Machek
Date: Thursday, May 17, 2007 - 2:23 am

What was the rationale? Can we get this one reverted? 

Some machines (HP omnibook xe3) have broken trip points -- too high --
so machine will overheat and trigger hw shutdown before starting
passive cooling.

That's really broken, and write to trip points is reasonable way to
'fix' that. (I'd understand if you only ever let trip points to
decrease... but otoh root should be able to shoot himself....)

							Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-

From: Maciej Rutecki
Date: Thursday, May 17, 2007 - 6:36 am

Many people need change trippoints, for example I have:

cat /proc/acpi/thermal_zone/TZ0/trip_points  | grep critical
critical (S5):           256 C

I _must_ change it to below 105 C, or edit DSDT table (too difficult to
me). I cannot use this kernel, when trip points are read only.

-- 
Maciej Rutecki
www.unixy.pl
Kernel Monkeys
(http://kernel.wikidot.com/)

From: Len Brown
Date: Thursday, May 17, 2007 - 12:08 pm

What bad things happen if you leave the critical trip point at 256?
Do you find that you can drive the temperature over 105 and
the system fails to shut down?

-Len

-

From: Maciej Rutecki
Date: Thursday, May 17, 2007 - 1:09 pm

It isn't problem in this case (nx6310). But on hp nc nc6220 first trip
point is at 30 *C, so fan is usually on (noise, power consumption).

-- 
Maciej Rutecki
www.unixy.pl
Kernel Monkeys
(http://kernel.wikidot.com/)

From: Maciej Rutecki
Date: Thursday, May 17, 2007 - 1:42 pm

From: Pavel Machek
Date: Thursday, May 17, 2007 - 2:53 pm

Something similar happened to me on XE3, yes.

(Actual values were different; BIOS specified critical temperature at
cca 95C, but hw killed the power at cca 83C. Setting critical trip
point at 80C made the problem go away.)
								Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-

From: Len Brown
Date: Thursday, May 17, 2007 - 3:42 pm

Great, please file a bug and include the acpidump from the XE3
and we'll fix it, rather than supporting a bogus (manual) workaround for it.

Of course if your system is running at 80*C and the hardware shuts
off at 83*C, you may have a broken fan, or one clogged with dust...

-Len

-

From: Pavel Machek
Date: Monday, May 21, 2007 - 5:11 am

It _did_ have broken fan. It also had broken trip points.
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-

From: Len Brown
Date: Thursday, May 31, 2007 - 7:46 pm

Thanks for clarifying this, Pavel.
If you come upon an XE3 where Linux-2.6.22 doesn't work as well
as Windows, please let me know.

Given that the justification for this ill-conceived workaround
seems to have diminished to the memory of broken hardware,
it is clear that we should stay the course of removing it
so that it doesn't further confuse future users.

If SuSE violently disagrees with me, you are certainly empowered
to restore the workaround in your distribution staring at 2.6.22
as part of your value add.  However, given its history of confusing
users, it seems that it might increase your support burden rather
than decrease it.

-Len
-

From: Pavel Machek
Date: Monday, June 4, 2007 - 4:16 am

"work as well as windows" is not good enough goal as far as I'm
concerned. Please don't break working setups.
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-

From: Len Brown
Date: Thursday, May 17, 2007 - 12:17 pm

No, writing trip-points is neither a fix, nor it is reasonable.
It is a workaround at best, and it is a dangerous and mis-leading hack.

The OS has no capability to actually change the ACPI trip points
that are used by the BIOS.  Changing the OS copy of them
to make the user think that trip events will actually
happen when the temperature crosses the OS copy is crazy.

If there are systems with broken thermals and the
ACPI thermal control needs and over-ride to turn
on the fan, then that is fine -- but using
fake trip-points and giving the user the impression
that they are real is not viable.

-Len
-

From: Pavel Machek
Date: Thursday, May 17, 2007 - 2:52 pm

Aha... wait. It seemed to work for me when I enabled thermal
polling...

Slowing cpu down / shutdown / turn the fan on is done in the os after
all. Should we just start polling temperatures when user writes custom

They become real when we fake _TSP, too, ..?
									Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-

From: Len Brown
Date: Thursday, May 17, 2007 - 3:35 pm

That's exactly the point.
If you allow a user to think they over-rode a trip-point
but that trip point never fires unless they enable polling mode,
then they're not going to get what they asked for.

Yes, SuSE enables polling mode by default, but that is just

I actually agree with you for passively cooled embedded systems.
Indeed, that is the topic of one of my OLS papers.

However, for an off-the-shelf laptop that the vendor ships
with a specific active and passive cooling model, Linux
is not currently set up to ignore what the vendor provided
and go off on its own.  Yes, it could be done, but for

We are mis-using _TSP today, and over-riding it
is a hack on top of a bug...

_TSP is only supposed to be for the passive cooling
algorithm -- which by definition is polling based.
It is not intended to be used for active cooling at all.
That is what active trip were invented for...

-Len
-

From: Stefan Seyfried
Date: Monday, June 4, 2007 - 2:02 am

I will do that for openSUSE FACTORY.
-- 
Stefan Seyfried
QA / R&D Team Mobile Devices        |              "Any ideas, John?"
SUSE LINUX Products GmbH, Nürnberg  | "Well, surrounding them's out." 

This footer brought to you by insane German lawmakers:
SUSE Linux Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
-

From: Pavel Machek
Date: Monday, June 4, 2007 - 4:06 am

Well, I still believe right solution is to enable polling mode as soon
as trip points are written (and ignoring bios updates from then
on). That gets trip point writing into functional state.

									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-

From: Thomas Renninger
Date: Saturday, May 19, 2007 - 12:56 pm

Yes it is a workaround for critical ACPI bugs like that or similar:
https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.17/+bug/22336

It's also convenient to e.g. lower passive trip point to avoid fan
noise.

Some people are used to it, I already wanted to write a little userspace
prog to use them as it is really easy to fake cooling_mode (trip points
are modified by BIOS) and eliminate fan noise and other things by e.g.
reducing passsive or whatever trip point.

This is at least a major sysfs interface change, has this been discussed
somewhere before or declared deprecated?

It's there for a long time, why is this "a dangerous and mis-leading
hack." now?

I'd suggest to revert this and I can come with something like "only
allow lower values
than BIOS provides" patch if the current implementation is considered
dangerous.



-

From: Len Brown
Date: Sunday, May 20, 2007 - 8:50 pm

Thanks for pointing that out -- it is a great example
of how powerful mis-information can be.

The fact that the trip-points are writable has obscured,
rather than clarified, the actual causes of the failures.
No less than 4 people in that bug report declared that
cleaning the dust out of their fan fixed the root cause.
A bunch more said that the issues went away when they 
stopped using ubuntu's user-space power save daemon.

There are a couple more with broken active fan control --
which also gets obscured rather than clarified by
over-riding trip points.

And finally, there are probably some with clean fans
that are working properly, but are thermally challenged
systems.  I'll venture that Windows is NOT modifying or disabling
the critical trip point to work around this issue.
I'll venture that their thermal throttling is working
and ours may not be.

perhaps it was the recently fixed mod_timer() bug in thermal.c,

nope, the OS can't reliably override the processor passive trip point.
That is what _SCP and cooling_mode are for.

The reason is that the BIOS can send us a trip-point changed event at any time,
the kernel will evaluate _PSV, and wipe out the modified OS version.

if you want to change the state of the fans,
then poke /proc/acpi/fan/ directly.
This will have effect until the next trip point



It has been dangerous and misleading since the day it went in.
If the user doesn't enable polling, then they are effectively
writing random numbers that have absolutely no effect on
the operation of the system, and hiding the numbers that

That simply will not address the issue.
Indeed, all the entries in the ubuntu bug report are about hitting
the critical temperature and having a critical shutdown when
it isn't wanted.  These people want to RAISE the critical shutdown
trip-point.  Their cooling problems must be fixed -- raising critical
trip points causes them instead to be ignored.

For folks with the reverse problem -- active cooling where the
fans ...
From: Thomas Renninger
Date: Monday, May 21, 2007 - 4:31 am

Whatever it was, it's in a final Ubuntu dist and the trip point
interface
could help some people to still be able to use it.

ACPI is very machine specific. 100 machines may work well and QA might
oversee the 100 and first where critical shutdowns or whatever happens.
Such workarounds are really helpful then.

Same for ignore _PPC and thermal polling (the latter is always on in our
distro,
I bet a lot machine would break if disabling it and just ripping out the
ability to set it, is really not a solution).

One big challenge in the ACPI subsystem (kernel or userspace) is to find
out BIOS implemenations that are at the limit of specs or which violate
the
specs and try to workaround them.
We are not in the position of M$ (at least in the desktop/laptop
segment) yet.
BIOS developers won't follow our implementations and IMO we should go
the
other way and provide more workarounds. If nobody needs them, the

Yes, it's not correct and those trip points might get overridden by BIOS
again on some machines. It still could help and doesn't hurt (Ok, one
should
not increase the critical trip point, but that can be implemented...).

Again, pls go for more workarounds.

The most annoying situation for the developer and the user is after
investing
a lot of time, finding and possibly fixing a bug and then you need to
tell the guy:
  - Got it, please wait for the next kernel release coming out in some
weeks/months
  - Thanks for the work, but implementing it in the kernel of this ditro
version
    is too dangerous. Other machines might break (especially with ACPI
bugs). Better
    you wait for the next distro version coming out in half a year.



-

From: Pavel Machek
Date: Monday, May 21, 2007 - 5:10 am

Heh, you suggest this? It is even less functional than current
solution -- which works okay as long as you keep thermal polling

You are misstating the situation. With thermal polling, it is pretty
much okay, and it is certainly better than "ride fans manually" hack

No. Manually turning off fans is even worse hack.
							Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-

From: Matthew Garrett
Date: Monday, May 21, 2007 - 6:27 am

As Len says, the system can force a reevaluation of the trip points at 
any time which will wipe out the local settings. Either you ignore the 
spec and the notifications (potentially risking misbehaving hardware) or 


It's significantly more correct.

-- 
Matthew Garrett | mjg59@srcf.ucam.org
-

From: Pavel Machek
Date: Monday, May 21, 2007 - 6:29 am

Significantly more correct? It forces you to do all the thermal
management in userspace!
								Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-

From: Matthew Garrett
Date: Monday, May 21, 2007 - 6:36 am

Why's that a problem? Overriding the hardware policy has to be done 
somewhere, and doing it in userspace is no more dangerous than 
kernelspace.

-- 
Matthew Garrett | mjg59@srcf.ucam.org
-

From: Pavel Machek
Date: Monday, May 21, 2007 - 6:40 am

Duplicating all the kernel logic in userspace, badly?
									Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-

From: Matthew Garrett
Date: Monday, May 21, 2007 - 6:45 am

So don't do it badly. The advantage of doing so is that you can make it 
work properly, which you can't by putting it in the kernel.

-- 
Matthew Garrett | mjg59@srcf.ucam.org
-

From: Pavel Machek
Date: Monday, May 21, 2007 - 3:42 pm

You want stuff like critical shutdowns to work even if userspace is
dead.

I do not think you can control passive cooling adequately from
userspace, and you can certainly not prevent kernel from slowing
machine down too soon.

Plus, this is actually nasty user-visible change, and a regression
from 2.6.21. I am not sure why we are even debating this; user-kernel
interface changed without warning. Patch should be simply reverted.

								Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-

From: Matthew Garrett
Date: Monday, May 21, 2007 - 5:31 pm

I don't think anyone suggested putting the critical shutdown control in 

Given the choice between something impossible and something difficult, 

In http://lkml.org/lkml/2007/1/27/93 you were more than happy to break 
an interface even though it could be fixed in a (ugly) way that made it 
work again. Here, there's no way to fix this properly - the platform 
will quite happily do things based on what it believes the trip points 
should be, and one of those things may be to alter the trip points. 
Imagine the following situation:

1) Platform sets critical shutdown trip point to 85C
2) Userspace sets critical shutdown trip point to 95C
3) Temperature reaches 90C
4) Platform forces reevaluation of trip points
5) Entire invasion fleet is lost

How do you avoid that? Disable the ability for the platform to set trip 
points? You're breaking the spec and potentially causing hardware 
damage. If you have specific hardware that requires specific spec 
breakage, then a better approach would probably be to quirk the kernel 
to rectify it. On the other hand, if it works with the Other Leading OS, 
we ought to be able to just fix the problem properly.
-- 
Matthew Garrett | mjg59@srcf.ucam.org
-

From: Pavel Machek
Date: Tuesday, May 22, 2007 - 2:06 am

No it does not. That is what this thread is about.

(On old xe3, critical trip point set by BIOS is ~95C, but machine dies
by hw safeguard at ~83C. Workaround is to lower critical trip point to

We need to ignore trip point updates from BIOS, and we need to poll
thermals when use overrides trip points. That's expected. Plus I've
yet to see platform actually updating the trip points.

Speaking about hw damage... The broken BIOS on xe3 definitely caused
damage to its harddrive, so... we are preventing hw damage here.

(Plus, Len's patch broke user-kernel in stable series, without warning).
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-

From: Matthew Garrett
Date: Tuesday, May 22, 2007 - 2:16 am

Try any recent HP bios.

-- 
Matthew Garrett | mjg59@srcf.ucam.org
-

From: Goulven Guillard
Date: Tuesday, May 22, 2007 - 2:28 am

man cron... ;-)





-- 
    ~~
   |Oo|   La banquise fond !!! Adoptez un pingouin...
  /|\/|\
   |__|            => http://doc.ubuntu-fr.org/
   ^__^
~~~|  |~~~








-

From: Maciej Rutecki
Date: Tuesday, May 22, 2007 - 3:05 am

Matthew Garrett pisze:

Yes...

hp nx 6310, bios version:
F.06. cpufreq works, MFCG Bios Error in dmesg (PCI: BIOS Bug: MCFG area
at f8000000 is not E820-reserved)
F.08. like above + cpufreq broken
F.09 Remove this errors, but problem with reboot (too long time - remove
psmouse module doesn't help) - some people reports it (i didn't test it)
F.0B suspend to ram broken, after suspend to disk keyboard doesn't work
F.0D I don't have the heart test it...

-- 
Maciej Rutecki
http://www.maciek.unixy.pl
From: Stefan Seyfried
Date: Monday, June 4, 2007 - 2:13 am

Thinkpad 600, whenever a trip point is crossed, all trip points are updated.
I think they implemented hysteresis that way.
ISTR that hp nx5000 did something similar, but i might be wrong on this one.
-- 
Stefan Seyfried
QA / R&D Team Mobile Devices        |              "Any ideas, John?"
SUSE LINUX Products GmbH, Nürnberg  | "Well, surrounding them's out." 

This footer brought to you by insane German lawmakers:
SUSE Linux Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
-

From: Thomas Renninger
Date: Thursday, May 24, 2007 - 7:16 am

Stripping some CCs, acpi and kernel list should be enough this one goes
to...


I doubt it is impossible, would you mind sharing your knowledge why you
think it is impossible or point to some related discussion, pls.

Does this mean checking temperature against trip points and adjust fan
and cpufreq should be done in a hal module?
In which stage is this, rfc, development, already in some git tree?

Yes, trip points are overridden by BIOS on HPs and what is the problem?
The workaround won't work for them, but it still does on others
(mainly on ThinkPads which have passive tp at about 89 C and critical on
91 C).

I could imagine an implementation for this, that e.g. critical...active9
get module parameters. BIOS updates for trip points get ignored as soon
as one is set and you can only decrease a value. Nothing bad can happen
and it will make some people happy (yes it's hacky, violates the specs
and so on..., but some more people have a working machine). Will this
(or similar) get accepted?

It's even more impossible to get ACPI working correctly for all machines
and all subsystems, these little workarounds can help some people to at
least use their machine or get some parts working better.

   Thomas

-

From: Matthew Garrett
Date: Thursday, May 24, 2007 - 7:36 am

Because, as Len has pointed out, you end up with two different ideas 
about what the trip points are - the kernel's and the hardware's. That 
works fine until some event in the firmware either forcibly 
resynchronises the two or makes assumptions about the spec-compliance of 

You don't know whether the workaround will work or not until you've 
performed a full audit of the platform firmware, which is going to 
potentially change between BIOS versions. It's entirely legal for the 
firmware to behave in this way, and even beneficial under various 

The interface would need to be more complicated than that if you wanted 
to be able to implement hysteresis, and there's the potential for 
hardware damage if paramaters are set inappropriately. Even then, 
there's no easy way of programatically determining whether it would work 

It's fairly clearly not impossible, given that there exists at least one 
OS that these machines work with.

-- 
Matthew Garrett | mjg59@srcf.ucam.org
-

From: Thomas Renninger
Date: Thursday, May 24, 2007 - 11:18 am

Not sure what exactly you'd like to do in userspace, maybe you can be a
bit more precise here:
  a) Doing whole thermal management in userspace, reading temp, writing
     fan and cpufreq_max_freq, shutting down machine,...
  b) Workaround not switching on fans by double checking fan/temperature
     by a userspace daemon and try to finally trigger the switch by 
     writing to /proc/acpi/fan/state (or corresponding /sys,..)

IMO we need a some kind of fan watchdog like Henrique described
recently, maybe this could be put in userspace not sure.
Currently the fan can runs out of sync easily if the fan state is
Hmm, I don't get the point. If it works it's great, if not you have a
But that's exactly what all these workarounds are for. You pass them if
you have a buggy BIOS. You wait for new BIOSes and hope that you can get

The fact that 3 people complained rather fast for a patch in rc1-mm1,
looks like this is a workaround that is needed. I personally advised two
guys to use it with their ThinkPad in the summer and they are happy with
it.

I'd also like to have this a bit extended: be able to just modify
passive trip point.
IMO this is a very powerful feature allowing people a fanless system as
long as they have a cpufreq capable processor.

The idea having this in userspace is interesting. But as said rather
complicated to implement. The hysteresis implementation for passive
cooling works fine in kernel and is field tested, it should get used.

The problem with the ACPI spec is that it's rather complicated. This is
IMO mainly for a BIOS developer point of view for what I can say.
Therefore it's rather seldom picked up by BIOS vendors.
However for the kernel it's easy (to fake, to do) and it's working fine,
so why not making use of it?

IMO we should even provide a passive trip point (initially unused) when
there is no one defined by BIOS.

I agree that it's hard to find the temperature to not let the fan kick
in automatically. But it's really easy then for everyone ...

...and suggested workaround is to drive fans directly from userspace,
which not only violates the specs and has all the problems with

Not sure why you try to scare people with 'hardware damage'. HP XE3
bios already _was_ damaging hardware (it cooked the hard drive using
cpu as a heater), and no acpi magic can damage correctly working
machine.
							Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-


I don't think that's obviously true. 11.3.2 of the 3.0 spec states:

"A package consisting of references to all active cooling devices that 
should be engaged when the associated active cooling threshold (_ACx) is 
exceeded." 


Given that this presumably didn't occur under Windows, I think it would 
be significantly better to figure out why and then fix that. 
Alternatively, if the firmware tables are actually genuinely broken in a 
way that's impossible to repair, you can replace the table. That has the 
advantage that there's no risk of the platform and the OS becoming 
confused.

-- 
Matthew Garrett | mjg59@srcf.ucam.org
-


We'd need:

a) way to tell acpi not to control fans any more

b) in kernel watchdog so that acpi starts controlling fans after oom
killer

c) way to control passive cooling from userspace.


It would happily occur under Windows. You just needed to load machine
in a way that cpu stayed ~80C.
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-


So replace the DSDT. All the problems get solved that way.
-- 
Matthew Garrett | mjg59@srcf.ucam.org
-


We are in the middle of stable series, and Len's patch breaks existing
setups without prior warning. That's "no-no". Of course I could
replace DSDT. I also could throw that machine out of window.

I'm not sure what we are arguing about here, that patch is broken.

									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-

From: Adrian Bunk
Date: Wednesday, May 16, 2007 - 11:55 am

This breaks the compilation of the oldest of our IDE disk drivers:

<--  snip  -->

...
  LD      .tmp_vmlinux1
drivers/built-in.o: In function `hd_init':
hd.c:(.init.text+0x44a7d): undefined reference to `drive_info'
hd.c:(.init.text+0x44a89): undefined reference to `drive_info'
hd.c:(.init.text+0x44a95): undefined reference to `drive_info'
hd.c:(.init.text+0x44aa1): undefined reference to `drive_info'
hd.c:(.init.text+0x44aad): undefined reference to `drive_info'
drivers/built-in.o:hd.c:(.init.text+0x44ab9): more undefined references to `drive_info' follow
make[1]: *** [.tmp_vmlinux1] Error 1

<--  snip  -->

Considering the fact that we have two more recent drivers with the same 
functionality, it might be an option to simply remove this driver...

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

-

From: Bartlomiej Zolnierkiewicz
Date: Wednesday, May 23, 2007 - 4:45 pm

Hi,


Care to send a patch?

Thanks,
Bart
-

From: Alan Cox
Date: Thursday, May 24, 2007 - 3:55 am

hd.c can drive MFM and RLL disks and drivers/ide cannot. Although it
really wants burying further down the config tree the ability to read MFM
and RLL disks when recovering ancient data is useful and people do
actually use this driver now and then rescuing stuff like twenty year old
datasets.

It thus needs fixing not removing.


Alan
-

From: H. Peter Anvin
Date: Thursday, May 24, 2007 - 11:53 am

Why is this driver parked in drivers/ide/legacy when the companion
driver, xd.c, is in drivers/block (where hd.c used to be at one point,
too)?  Especially so since it's not really for IDE, but for ST-506.

HOWEVER, the code that fails above hard-assumes that the ST-506 disks
that you have are your primary system drives, which is obviously a wrong
assumption -- ST-506 drives were obsolete quite a while before Linux
existed[1].

xd.c, on the other hand, seems to actually go out and query the hardware
directly.  I guess this is understandably, since this controller would
never have been primary.

If hd.c is pure legacy, which it obviously is, should we remove the code
to assume the BIOS settings are the MFM/RLL settings (i.e. the __i386__
clause), and instead do something more like the __arm__ clause which
means that "if you really want to use this you have to specify the
parameters manually"?

	-hpa


[1] The 386-16 that I had access to at Northwestern, which with 0.59
BogoMIPS was the slowest Linux system in existence until Linux was
ported to other architectures, might have been an ST-506 drive, but I'm
not sure.
-

From: H. Peter Anvin
Date: Thursday, May 24, 2007 - 5:05 pm

From: Alan Cox
Date: Thursday, May 24, 2007 - 5:14 pm

I believe the technical description for the comment is "bullshit" 8)

Almost all MFM controllers and RLL controllers will only run at the
standard primary and secondary ATA address.

Given the intended use of the driver today I don't see a big problem in
requiring "hd=" although you have to question the point of this boot code
rewrite when it seems primarily to be removing features 

Alan
-

From: H. Peter Anvin
Date: Thursday, May 24, 2007 - 5:18 pm

Yes, but that doesn't (necessarily) apply to the controller that is
likely to be the primary controller in a modern system.

The whole point is that what the BIOS considers primary isn't
necessarily tied to the standard ATA addresses anymore, with SATA
controllers being primary.

The question I'm asking is: do you think it's better to remove this from
hd.c, or do you think it's better to add it back boot code BIOS
detection (and take the risk of poking an ST-506 disk with legacy data
with parameters which may belong to another disk -- keep in mind this

I've been trying to remove features that are obsolete and/or broken.  I
don't have access to this particular ancient hardware, nor any system
that can even host them.   It's very easy to add the stuff back in the
boot code; it's a much more tricky/annoying question if one *should* do
so.  That's part of a rewrite/cleanup.

	-hpa
-

From: Alan Cox
Date: Thursday, May 24, 2007 - 5:38 pm

To set it up the user will have to know the parameters and have typed
them into the BIOS (if it even has an option for it). I see no problem

-

From: H. Peter Anvin
Date: Thursday, May 24, 2007 - 5:51 pm

Sorry, see no problem which way?  My concern here is with getting
incorrect data, not getting no data.  The BIOS probe amounts to pulling
data out of two tables (INT 0x41/0x46, corresponding to BIOS drives 0x80
and 0x81 -- the EDD 1.1 spec is quite specific that if implemented they
follow the BIOS drive numbers, not the ATA port addresses), and hoping
that they actually match the drives that hd.c uses.  That scares me,
since we're talking about old legacy data here.

I'm not concerned with what's easy, I'm concerned with what's the right
thing to do.

	-hpa

-

From: Alan Cox
Date: Friday, May 25, 2007 - 7:19 am

Forcing the user to provide the geometry. Historically that driver dealt
with the main disks the user had. Today its only use is specialist
recovery work. Anyone recovering a disk has to get the geometry data into
the BIOS (if the BIOS even allows it - many now don't) and will therefore
know it for hd= arguments as well

Alan
-

From: Reuben Farrelly
Date: Thursday, May 17, 2007 - 5:38 am

I have just seen this on boot, with 2.6.22-rc2-mm1 on x86_64:

--

libata version 2.20 loaded.
usbcore: registered new interface driver usbfs
usbcore: registered new interface driver hub
usbcore: registered new device driver usb
PCI: Using ACPI for IRQ routing
PCI: If a device doesn't work, try "pci=routeirq".  If it helps, post a report
BUG: at include/linux/slub_def.h:88 kmalloc_index()

Call Trace:
  [<ffffffff8034f3f9>] pci_dev_put+0x12/0x14
  [<ffffffff80283f30>] get_slab+0xb5/0x265
  [<ffffffff802841bc>] __kmalloc+0x13/0xa3
  [<ffffffff8021a4aa>] cache_k8_northbridges+0x80/0x116
  [<ffffffff8063fed2>] gart_iommu_init+0x16/0x594
  [<ffffffff804562ac>] genl_rcv+0x0/0x68
  [<ffffffff804548ed>] netlink_kernel_create+0x15e/0x16b
  [<ffffffff804acc52>] mutex_unlock+0x9/0xb
  [<ffffffff80639fad>] pci_iommu_init+0x9/0x12
  [<ffffffff806306af>] kernel_init+0x152/0x322
  [<ffffffff80249c7c>] trace_hardirqs_on+0xc0/0x14e
  [<ffffffff804ae03d>] trace_hardirqs_on_thunk+0x35/0x37
  [<ffffffff80249c7c>] trace_hardirqs_on+0xc0/0x14e
  [<ffffffff8020a848>] child_rip+0xa/0x12
  [<ffffffff80209f5c>] restore_args+0x0/0x30
  [<ffffffff8063055d>] kernel_init+0x0/0x322
  [<ffffffff8020a83e>] child_rip+0x0/0x12

PCI-GART: No AMD northbridge found.
hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0
hpet0: 3 64-bit timers, 14318180 Hz
ACPI: RTC can wake from S4
pnp: 00:01: iomem range 0xf0000000-0xf3ffffff has been reserved
pnp: 00:01: iomem range 0xfed13000-0xfed13fff has been reserved

--

The full dmesg is at http://www.reub.net/files/kernel/2.6.22-rc1-mm1-dmesg and 
the config up at http://www.reub.net/files/kernel/2.6.22-rc1-mm1-config

The machine otherwise seems to run OK.

Reuben
-

From: Satyam Sharma
Date: Thursday, May 17, 2007 - 5:52 am

This ( http://lkml.org/lkml/2007/5/16/350 ) patch by Ben Collins
submitted yesterday should take care of this.

Thanks,
Satyam
-

From: Mariusz Kozlowski
Date: Sunday, May 20, 2007 - 3:12 am

Hello,

	I tried it on iMac G3. I got a bunch of warnings
and finally it failed to build.

WARNING: "fee_restarts" [arch/powerpc/kernel/built-in] is COMMON symbol
WARNING: "ee_restarts" [arch/powerpc/kernel/built-in] is COMMON symbol
WARNING: arch/powerpc/kernel/built-in.o - Section mismatch: reference to .init.data:.got2 from dt_string_start (offset 0x8)
WARNING: arch/powerpc/kernel/built-in.o - Section mismatch: reference to .init.data:.got2 from dt_string_end (offset 0xc)
WARNING: arch/powerpc/kernel/built-in.o - Section mismatch: reference to .init.data:.got2 from prom_entry (offset 0x10)
WARNING: arch/powerpc/kernel/built-in.o - Section mismatch: reference to .init.data:.got2 from prom (offset 0x3c)
WARNING: arch/powerpc/kernel/built-in.o - Section mismatch: reference to .init.data:.got2 from of_platform (offset 0x50)
WARNING: arch/powerpc/kernel/built-in.o - Section mismatch: reference to .init.data:.got2 from mem_reserve_cnt (offset 0x58)
WARNING: arch/powerpc/kernel/built-in.o - Section mismatch: reference to .init.data:.got2 from mem_reserve_map (offset 0x60)
WARNING: arch/powerpc/kernel/built-in.o - Section mismatch: reference to .init.data:.got2 from alloc_bottom (offset 0x64)
WARNING: arch/powerpc/kernel/built-in.o - Section mismatch: reference to .init.data:.got2 from ram_top (offset 0x68)
WARNING: arch/powerpc/kernel/built-in.o - Section mismatch: reference to .init.data:.got2 from alloc_top (offset 0x70)
WARNING: arch/powerpc/kernel/built-in.o - Section mismatch: reference to .init.data:.got2 from prom_scratch (offset 0x8c)
WARNING: arch/powerpc/kernel/built-in.o - Section mismatch: reference to .init.data:.got2 from dt_header_start (offset 0xbc)
WARNING: arch/powerpc/kernel/built-in.o - Section mismatch: reference to .init.data:.got2 from dt_struct_start (offset 0xc4)
WARNING: arch/powerpc/kernel/built-in.o - Section mismatch: reference to .init.data:.got2 from dt_struct_end (offset 0xcc)
WARNING: arch/powerpc/kernel/built-in.o - Section mismatch: reference to ...
From: Sam Ravnborg
Date: Sunday, May 20, 2007 - 3:21 am

....

Most - but not all of these warnings should be gone when
Linus pulls kbuild-fix.git.
When -rc3 is ready can you then please post the result of a build.
Then I can take a look at the remaining section mismatch warnings.

	Sam
-

From: Kumar Gala
Date: Sunday, May 20, 2007 - 8:33 am

Also, I've got fixes for the COMMON symbol warnings.

- k
-

From: Joseph Fannin
Date: Tuesday, May 22, 2007 - 12:25 am

I've been getting this since 2.6.21-rc7-mm1:

[    2.379310] BUG: unable to handle kernel paging request at virtual address 4400d340
[    2.379491]  printing eip:
[    2.379573] c021c978
[    2.379656] *pdpt = 000000000353c001
[    2.379739] *pde = 0000000000000000
[    2.379824] Oops: 0000 [#1]
[    2.379906] PREEMPT SMP
[    2.380059] Modules linked in: thermal processor dm_mod
[    2.380288] CPU:    0
[    2.380289] EIP:    0060:[<c021c978>]    Not tainted VLI
[    2.380291] EFLAGS: 00010297   (2.6.22-rc1-mm1 #2)
[    2.380547] EIP is at vsnprintf+0x448/0x5d0
[    2.380633] eax: 4400d340   ebx: c348f034   ecx: 4400d340   edx: fffffffe
[    2.380721] esi: c03e0100   edi: 4400d340   ebp: c357ecc0   esp: c357ec68
[    2.380810] ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
[    2.380898] Process udevtrigger (pid: 686, ti=c357e000 task=c1876df0 task.ti=c357e000)
[    2.380987] Stack: c348f014 00000fec c03e1c60 c03e3cec c357eccc c0499b88 c357ece0 c0282513
[    2.381428]        c348f014 00000fec 3cb70fcb c348f034 ffffffff 00000000 ffffffff ffffffff
[    2.381867]        ffffffff fffffffe c03e017c c357ed18 00000034 c0494a20 c357ece0 c021cb9f
[    2.382305] Call Trace:
[    2.382470]  [<c021cb9f>] sprintf+0x1f/0x30
[    2.382594]  [<c02815ed>] show_uevent+0xed/0x130
[    2.382720]  [<c0281163>] dev_attr_show+0x23/0x30
[    2.382843]  [<c01dc077>] sysfs_read_file+0x97/0x140
[    2.382968]  [<c019502f>] vfs_read+0xaf/0x180
[    2.383096]  [<c0198c3a>] kernel_read+0x3a/0x50
[    2.383221]  [<c01f126c>] evm_calc_hash+0x11c/0x240
[    2.383347]  [<c01efd39>] evm_file_free+0xb9/0x330
[    2.383470]  [<c0195a3a>] __fput+0xba/0x180
[    2.383593]  [<c0195c32>] fput+0x22/0x40
[    2.383715]  [<c0192e07>] filp_close+0x47/0x70
[    2.383839]  [<c0194109>] sys_close+0x69/0xc0
[    2.383965]  [<c01043c8>] syscall_call+0x7/0xb
[    2.384092]  [<b7ebd0a7>] 0xb7ebd0a7
[    2.384212]  =======================
[    2.384295] INFO: lockdep is turned off.
[    2.384379] Code: 21 fd ff ff c6 ...
From: Andrew Morton
Date: Tuesday, May 22, 2007 - 2:23 pm

On Tue, 22 May 2007 03:25:48 -0400

OK, thanks.  Does the crash go away if you disable IMA, SLIM, etc in .config?

I think I'll drop all those patches, actually - they don't seem to be going
anywhere.

-

From: Mimi Zohar
Date: Friday, May 25, 2007 - 2:05 pm

relevant:

You are absolutely right, we have been stalled on EVM/IMA/SLIM, while 
trying
to figure out the mtime and revocation issues. In retrospect we tried to 
submit
too much complex code all at once. 

We will resubmit in small functional pieces as the technical issues have 
been
resolved, starting with the LIM API and hooks, which are independent of 
the 
mtime and revocation issues.

Mimi Zohar


-

Previous thread: [BUG] (regression) AMD k6-III/450 won't boot w/2.6.22-rc1 by Bob Tracy on Tuesday, May 15, 2007 - 8:13 pm. (16 messages)

Next thread: [RFC/PATCH 1/2] powerpc: unmap_vm_area becomes unmap_kernel_range by Benjamin Herrenschmidt on Tuesday, May 15, 2007 - 8:45 pm. (2 messages)