Re: [PATCH 2/2] [MTD] Call bdi_init() and bdi_register()

Previous thread: Re: [PATCH] usb: core: config.c: usb_get_configuration() simplified by Alan Stern on Saturday, April 17, 2010 - 9:13 am. (1 message)

Next thread: [PATCH 5/7] viafb: improve misc register handling by Florian Tobias Schandinat on Saturday, April 17, 2010 - 12:44 pm. (1 message)
From: Jörn Engel
Date: Saturday, April 17, 2010 - 11:40 am

Moin David,

if I read the code correctly, JFFS2 will happily perform a NOP on
sys_sync() again.  And this time it appears to be Jens' fault.

JFFS2 does not appear to set s_bdi anywhere.  And as of 32a88aa1,
__sync_filesystem() will return 0 if s_bdi is not set.  As a result,
sync_fs() is never called for jffs2 and whatever remains in the wbuf
will not make it to the device.

The patch also adds a BUG_ON to catch this problem in any remaining or
future offenders.  I am not sure about network filesystems, but at
least bdev- and mtd-based ones should be caught.

Opinions?

Jörn

-- 
No art, however minor, demands less than total dedication if you want
to excel in it.
-- Leon Battista Alberti

diff --git a/drivers/mtd/mtdsuper.c b/drivers/mtd/mtdsuper.c
index af8b42e..7c00319 100644
--- a/drivers/mtd/mtdsuper.c
+++ b/drivers/mtd/mtdsuper.c
@@ -13,6 +13,7 @@
 #include <linux/mtd/super.h>
 #include <linux/namei.h>
 #include <linux/ctype.h>
+#include <linux/slab.h>
 
 /*
  * compare superblocks to see if they're equivalent
@@ -44,6 +45,7 @@ static int get_sb_mtd_set(struct super_block *sb, void *_mtd)
 
 	sb->s_mtd = mtd;
 	sb->s_dev = MKDEV(MTD_BLOCK_MAJOR, mtd->index);
+	sb->s_bdi = mtd->backing_dev_info;
 	return 0;
 }
 
diff --git a/fs/super.c b/fs/super.c
index f35ac60..e8af253 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -954,10 +954,12 @@ vfs_kern_mount(struct file_system_type *type, int flags, const char *name, void
 	if (error < 0)
 		goto out_free_secdata;
 	BUG_ON(!mnt->mnt_sb);
+	BUG_ON(!mnt->mnt_sb->s_bdi &&
+			(mnt->mnt_sb->s_bdev || mnt->mnt_sb->s_mtd));
 
- 	error = security_sb_kern_mount(mnt->mnt_sb, flags, secdata);
- 	if (error)
- 		goto out_sb;
+	error = security_sb_kern_mount(mnt->mnt_sb, flags, secdata);
+	if (error)
+		goto out_sb;
 
 	/*
 	 * filesystems should never set s_maxbytes larger than MAX_LFS_FILESIZE
--

From: Jens Axboe
Date: Monday, April 19, 2010 - 12:38 am

I think that BUG_ON() would be a lot better as a printk() and fail mount
instead. There's really little point in killing the kernel for something
you could easily warn about and recover nicely.

-- 
Jens Axboe

--

From: Jörn Engel
Date: Monday, April 19, 2010 - 3:15 am

*shrug*
The BUG_ON directly above is not qualitatively different.  In both cases
the only solution is to find and fix the bug in some other file,
recompile and try again.  But ultimately I don't care, as long as we
catch the bug before people lose their data.

Feel free to respin this patch.  You caused the problem in the first
place. ;)

For the record, while I consider the two-liner that causes this mess a
real turd, the rest of your patch was brilliant.  A shame I didn't catch
this earlier.

Jörn

-- 
You ain't got no problem, Jules. I'm on the motherfucker. Go back in
there, chill them niggers out and wait for the Wolf, who should be
coming directly.
-- Marsellus Wallace
--

From: Jens Axboe
Date: Monday, April 19, 2010 - 3:20 am

Thanks, we definitely should have put a debug statement to catch this in
from day 1, good debugging should be an important part of any new
infrastructure.

Care to send your jffs2 patch separately to David? Then I'll commit a
modified variant for complaining about missing ->s_bdi on mount.

-- 
Jens Axboe

--

From: Jörn Engel
Date: Monday, April 19, 2010 - 4:39 am

Sure.

David, this patch is untested.  It looks trivially correct and fixes a
nasty bug, but I don't test jffs2 and only noticed the problem in
passing.

Jörn

-- 
Maintenance in other professions and of other articles is concerned with
the return of the item to its original state; in Software, maintenance
is concerned with moving an item away from its original state.
-- Les Belady

diff --git a/drivers/mtd/mtdsuper.c b/drivers/mtd/mtdsuper.c
index af8b42e..7c00319 100644
--- a/drivers/mtd/mtdsuper.c
+++ b/drivers/mtd/mtdsuper.c
@@ -44,6 +45,7 @@ static int get_sb_mtd_set(struct super_block *sb, void *_mtd)
 
 	sb->s_mtd = mtd;
 	sb->s_dev = MKDEV(MTD_BLOCK_MAJOR, mtd->index);
+	sb->s_bdi = mtd->backing_dev_info;
 	return 0;
 }
 
--

From: Jörn Engel
Date: Wednesday, April 21, 2010 - 10:54 pm

Woke up early and had another look at this.  Looks like a much more
widespread problem.  Based on a quick grep an uncaffeinated brain:

9p	no s_bdi
afs	no s_bdi
ceph	creates its own s_bdi
cifs	no s_bdi
coda	no s_bdi
ecryptfs no s_bdi
exofs	no s_bdi
fuse	creates its own s_bdi?
gfs2	creates its own s_bdi?
jffs2	patch exists
logfs	fixed now
ncpfs	no s_bdi
nfs	creates its own s_bdi
ocfs2	no s_bdi
smbfs	no s_bdi
ubifs	creates its own s_bdi

I excluded all filesystems that appear to be read-only, block device
based or lack any sort of backing store.  So there is a chance I have
missed some as well.

Jörn

-- 
Simplicity is prerequisite for reliability.
-- Edsger W. Dijkstra
--

From: Jörn Engel
Date: Wednesday, April 21, 2010 - 11:26 pm

Linus,

I think this is bad enough that you should be involved.  32a88aa1 broke
a number of filesystems in a way that sync() would return 0 without
doing any work.  Even politicians are better at keeping the promises.

This is caused by the two-liner in __sync_filesystem:
	if (!sb->s_bdi)
		return 0;
s_bdi is set implicitly for all filesystems using set_bdev_super(), so
most block device based filesystems are safe.  There are, however, a
number of odd-balls around:


Obviously this list should get checked and all affected filesystems get
repaired.  Additionally we should add an assertion and BUG() or refuse
to mount or something.  My original patch to that extend was this:

diff --git a/fs/super.c b/fs/super.c
index f35ac60..e8af253 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -954,6 +954,8 @@ vfs_kern_mount(struct file_system_type *type, int flags, const char *name, void
 	if (error < 0)
 		goto out_free_secdata;
 	BUG_ON(!mnt->mnt_sb);
+	BUG_ON(!mnt->mnt_sb->s_bdi &&
+			(mnt->mnt_sb->s_bdev || mnt->mnt_sb->s_mtd));
 
  	error = security_sb_kern_mount(mnt->mnt_sb, flags, secdata);
  	if (error)
  		goto out_sb;

The real problem is finding a condition that has neither false positives
nor false negatives.  The "(mnt->mnt_sb->s_bdev || mnt->mnt_sb->s_mtd)"
part takes care of false positives like tmpfs, but it would catch none
of the network filesystems.  Should we instead annotate tmpfs and friends
with something like sb->s_dont_need_bdi?  It is the only way I can think
of not to miss something.

Jörn

-- 
People will accept your ideas much more readily if you tell them
that Benjamin Franklin said it first.
-- unknown

--

From: Linus Torvalds
Date: Thursday, April 22, 2010 - 7:25 am

Umm. Why not just remove the two-liner? It was incorrect. The comment says 
"this should be safe", and if it wasn't, then the commit that caused this 
all was total crap to begin with.

Why even discuss any other measures?

		Linus
--

From: Linus Torvalds
Date: Thursday, April 22, 2010 - 7:33 am

Grr. Ok, so we do need it. Because Jens made it not work without it, and 
didn't fix up the filesystems, just added a random comment saying "we 
shouldn't need to".

Double-grr. I hate misleading comments. It makes the patch and the code 
look like people knew what they were doing.

Jens - please help fix this up.

		Linus
--

From: Jens Axboe
Date: Thursday, April 22, 2010 - 9:27 am

Yeah sorry, that part was apparently not well thought through. It did go

Of course, I already posted a series of patches to fix this up. I want
to test them a bit, and I'll send them in tomorrow.

-- 
Jens Axboe

--

From: Jörn Engel
Date: Thursday, April 22, 2010 - 1:33 pm

How about something like this to catch future cases?  It compiles and
survived a test boot, so it does seem to work for the common cases like
tmpfs, procfs, etc.

Jens, you know the bdi code 10x better than me, would this work?

Jörn

-- 
ticks = jiffies;
while (ticks == jiffies);
ticks = jiffies;
-- /usr/src/linux/init/main.c

noop_backing_dev_info is used only as a flag to mark filesystems that
don't have any backing store, like tmpfs, procfs, spufs, etc.

Signed-off-by: Joern Engel <joern@logfs.org>

diff --git a/fs/super.c b/fs/super.c
index f35ac60..dc72491 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -693,6 +693,7 @@ int set_anon_super(struct super_block *s, void *data)
 		return -EMFILE;
 	}
 	s->s_dev = MKDEV(0, dev & MINORMASK);
+	s->s_bdi = &noop_backing_dev_info;
 	return 0;
 }
 
@@ -954,10 +955,11 @@ vfs_kern_mount(struct file_system_type *type, int flags, const char *name, void
 	if (error < 0)
 		goto out_free_secdata;
 	BUG_ON(!mnt->mnt_sb);
+	BUG_ON(!mnt->mnt_sb->s_bdi);
 
- 	error = security_sb_kern_mount(mnt->mnt_sb, flags, secdata);
- 	if (error)
- 		goto out_sb;
+	error = security_sb_kern_mount(mnt->mnt_sb, flags, secdata);
+	if (error)
+		goto out_sb;
 
 	/*
 	 * filesystems should never set s_maxbytes larger than MAX_LFS_FILESIZE
diff --git a/fs/sync.c b/fs/sync.c
index fc5c3d7..92b2281 100644
--- a/fs/sync.c
+++ b/fs/sync.c
@@ -14,6 +14,7 @@
 #include <linux/pagemap.h>
 #include <linux/quotaops.h>
 #include <linux/buffer_head.h>
+#include <linux/backing-dev.h>
 #include "internal.h"
 
 #define VALID_FLAGS (SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE| \
@@ -32,7 +33,7 @@ static int __sync_filesystem(struct super_block *sb, int wait)
 	 * This should be safe, as we require bdi backing to actually
 	 * write out data in the first place
 	 */
-	if (!sb->s_bdi)
+	if (!sb->s_bdi || sb->s_bdi == &noop_backing_dev_info)
 		return 0;
 
 	if (sb->s_qcop && sb->s_qcop->quota_sync)
diff --git a/include/linux/backing-dev.h ...
From: Jens Axboe
Date: Friday, April 23, 2010 - 3:05 am

Looks sane, it's a good start. I think we should augment that with a
check to ensure that we don't ever add dirty inodes to this bdi, since
it's not going to be flushed.

Something like a:

        WARN_ON(bdi == &noop_backing_dev_info);

to __mark_inode_dirty(). Looking at the code it should already trigger a
warning, since it'll check for BDI_CAP_NO_WRITEBACK (which isn't set
for noop_backing_dev_info) and the fact that noop-bdi isn't registered
to begin with.

So it's probably safe and good enough as-is, I'll add it. Thanks!

-- 
Jens Axboe

--

From: Jörn Engel
Date: Friday, April 23, 2010 - 1:55 pm

I cannot see this patch in your tree yet.  Could be the weekend or a
deliberate decision not to send this for 2.6.34-rc anymore.

In case it was a deliberate decision, can we please make it explicit?  I
don't like the idea of adding a BUG_ON() that potentially triggers for
thousands of people this late in the stabilization process - but it is
better than having people lose data.  Even if we already ran two stable
kernels that way.

Damned if you do, damned if you don't. :(

Jörn

-- 
Time? What's that? Time is only worth what you do with it.
-- Theo de Raadt
--

From: Jens Axboe
Date: Monday, April 26, 2010 - 2:48 am

It's there, I put it in yesterday. It's definitely 2.6.34-rc material, I

Yeah, it's a bad situation to be in. I changed that BUG_ON() to a
WARN_ON(). I'm not too worried about that part, I'm more worried about
the file system changed. OTOH, they do lack proper flushing now, so it's
likely not a huge risk from that perspective.

-- 
Jens Axboe

--

From: Jörn Engel
Date: Monday, April 26, 2010 - 7:32 am

It is worse still.  Using mtd->backing_dev_info results in 
kernel BUG at fs/fs-writeback.c:157

which is BUG_ON(!work->seen); in bdi_queue_work().  Logfs is affected
and I bet jffs2 is as well.  So much for dwmw2 or me actually testing
the fix. :(

I did a hexdump to see what sb->s_bdi actually contained and the result
was this:
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
04 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00 00 00 00 00 00 00 00 ...
From: Jens Axboe
Date: Monday, April 26, 2010 - 7:38 am

The important bit is that each bdi must be initialized and registered if
it's going to be handling dirty data, it can't just be a static
placeholder. See the bdi_setup_and_register() helper I added.

-- 
Jens Axboe

--

From: Jens Axboe
Date: Monday, April 26, 2010 - 7:45 am

Took a quick look, and you want bdi_setup_and_register() for the three
bdis listed in mtdbdi.c in mtdcore.c:init_mtd(). Or manual bdi_init()
and bdi_register(). I'll take a look post-dinner. Either is workable,
but since the flags are already setup, the latter may be cleaner.

-- 
Jens Axboe

--

From: Jörn Engel
Date: Monday, April 26, 2010 - 9:30 am

Ok, here are two patches that appear to solve the problem for me.

Jörn

-- 
Courage is not the absence of fear, but rather the judgement that
something else is more important than fear.
-- Ambrose Redmoon

Removes one .h and one .c file that are never used outside of
mtdcore.c.

Signed-off-by: Joern Engel <joern@logfs.org>
---
 drivers/mtd/Makefile        |    2 +-
 drivers/mtd/devices/drais.c |    2 +-
 drivers/mtd/internal.h      |   17 -----------------
 drivers/mtd/mtdbdi.c        |   43 -------------------------------------------
 drivers/mtd/mtdcore.c       |   33 ++++++++++++++++++++++++++++++++-
 5 files changed, 34 insertions(+), 63 deletions(-)

diff --git a/drivers/mtd/Makefile b/drivers/mtd/Makefile
index 82d1e4d..4521b1e 100644
--- a/drivers/mtd/Makefile
+++ b/drivers/mtd/Makefile
@@ -4,7 +4,7 @@
 
 # Core functionality.
 obj-$(CONFIG_MTD)		+= mtd.o
-mtd-y				:= mtdcore.o mtdsuper.o mtdbdi.o
+mtd-y				:= mtdcore.o mtdsuper.o
 mtd-$(CONFIG_MTD_PARTITIONS)	+= mtdpart.o
 
 obj-$(CONFIG_MTD_CONCAT)	+= mtdconcat.o
diff --git a/drivers/mtd/devices/drais.c b/drivers/mtd/devices/drais.c
index b93ff3f..6f81b28 100644
--- a/drivers/mtd/devices/drais.c
+++ b/drivers/mtd/devices/drais.c
@@ -33,7 +33,7 @@
 #include <linux/sched.h>
 #include "drais.h"
 
-#undef DEBUG_ERASE
+#define DEBUG_ERASE
 
 #define DRAIS_BB_MAGIC		0x3911b26ba5a931d8ull
 
diff --git a/drivers/mtd/internal.h b/drivers/mtd/internal.h
index c658fe7..e69de29 100644
--- a/drivers/mtd/internal.h
+++ b/drivers/mtd/internal.h
@@ -1,17 +0,0 @@
-/* Internal MTD definitions
- *
- * Copyright © 2006 Red Hat, Inc. All Rights Reserved.
- * Written by David Howells (dhowells@redhat.com)
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public License
- * as published by the Free Software Foundation; either version
- * 2 of the License, or (at your option) any later version.
- */
-
-/*
- * mtdbdi.c
- */
-extern struct ...
From: Jörn Engel
Date: Monday, April 26, 2010 - 9:31 am

Otherwise we hit a BUG_ON in bdi_queue_work().

Signed-off-by: Joern Engel <joern@logfs.org>
---
 drivers/mtd/mtdcore.c |   17 ++++++++++++++++-
 1 files changed, 16 insertions(+), 1 deletions(-)

diff --git a/drivers/mtd/mtdcore.c b/drivers/mtd/mtdcore.c
index cb4858b..8dd3e46 100644
--- a/drivers/mtd/mtdcore.c
+++ b/drivers/mtd/mtdcore.c
@@ -299,7 +299,7 @@ static struct device_type mtd_devtype = {
 
 int add_mtd_device(struct mtd_info *mtd)
 {
-	int i;
+	int i, err;
 
 	if (!mtd->backing_dev_info) {
 		switch (mtd->type) {
@@ -322,6 +322,12 @@ int add_mtd_device(struct mtd_info *mtd)
 		if (!mtd_table[i]) {
 			struct mtd_notifier *not;
 
+			err = bdi_register(mtd->backing_dev_info, NULL, "mtd%d",
+					i);
+			if (err) {
+				/* We lose the errno information :( */
+				break;
+			}
 			mtd_table[i] = mtd;
 			mtd->index = i;
 			mtd->usecount = 0;
@@ -692,6 +698,15 @@ static int __init init_mtd(void)
 	int ret;
 	ret = class_register(&mtd_class);
 
+	ret = bdi_init(&mtd_bdi_unmappable);
+	if (ret)
+		return ret;
+	ret = bdi_init(&mtd_bdi_ro_mappable);
+	if (ret)
+		return ret;
+	ret = bdi_init(&mtd_bdi_rw_mappable);
+	if (ret)
+		return ret;
 	if (ret) {
 		pr_err("Error registering mtd class: %d\n", ret);
 		return ret;
-- 
1.6.2.1

--

From: Jens Axboe
Date: Monday, April 26, 2010 - 10:02 am

Do the bdi_register() here as well.

-- 
Jens Axboe

--

From: Jörn Engel
Date: Monday, April 26, 2010 - 10:12 am

-ENOBRAIN

Initially I wanted to do just than.  Then I looked at the block layer
and thought we could create one backing_dev_info per mtd as well.  Not
necessarily a bad if I would actually _create_ them and not just reuse
the same ones over and over again.

Jörn

-- 
Linux [...] existed just for discussion between people who wanted
to show off how geeky they were.
-- Rob Enderle
--

From: Jens Axboe
Date: Tuesday, April 27, 2010 - 12:52 am

I cooked up that patch myself, here:

http://git.kernel.dk/?p=linux-2.6-block.git;a=commit;h=0661b1ac5d48eb47c8a5948c0554fea...

Care to give it a quick spin?

-- 
Jens Axboe

--

From: Paolo Minazzi
Date: Tuesday, April 27, 2010 - 1:11 am

I Jens,
I'm Paolo Minazzi and I have a problem with logfs and 2.6.34rc5.
Can I apply the patch to rc5 and make a test on my ARM board ?
Paolo



--

From: Jens Axboe
Date: Tuesday, April 27, 2010 - 1:16 am

Yes please do, below are the collected fixes for mtd in this area. It
should apply to recent -git just fine.

diff --git a/drivers/mtd/Makefile b/drivers/mtd/Makefile
index 82d1e4d..4521b1e 100644
--- a/drivers/mtd/Makefile
+++ b/drivers/mtd/Makefile
@@ -4,7 +4,7 @@
 
 # Core functionality.
 obj-$(CONFIG_MTD)		+= mtd.o
-mtd-y				:= mtdcore.o mtdsuper.o mtdbdi.o
+mtd-y				:= mtdcore.o mtdsuper.o
 mtd-$(CONFIG_MTD_PARTITIONS)	+= mtdpart.o
 
 obj-$(CONFIG_MTD_CONCAT)	+= mtdconcat.o
diff --git a/drivers/mtd/internal.h b/drivers/mtd/internal.h
index c658fe7..e69de29 100644
--- a/drivers/mtd/internal.h
+++ b/drivers/mtd/internal.h
@@ -1,17 +0,0 @@
-/* Internal MTD definitions
- *
- * Copyright © 2006 Red Hat, Inc. All Rights Reserved.
- * Written by David Howells (dhowells@redhat.com)
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public License
- * as published by the Free Software Foundation; either version
- * 2 of the License, or (at your option) any later version.
- */
-
-/*
- * mtdbdi.c
- */
-extern struct backing_dev_info mtd_bdi_unmappable;
-extern struct backing_dev_info mtd_bdi_ro_mappable;
-extern struct backing_dev_info mtd_bdi_rw_mappable;
diff --git a/drivers/mtd/mtdbdi.c b/drivers/mtd/mtdbdi.c
index 5ca5aed..e69de29 100644
--- a/drivers/mtd/mtdbdi.c
+++ b/drivers/mtd/mtdbdi.c
@@ -1,43 +0,0 @@
-/* MTD backing device capabilities
- *
- * Copyright © 2006 Red Hat, Inc. All Rights Reserved.
- * Written by David Howells (dhowells@redhat.com)
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public License
- * as published by the Free Software Foundation; either version
- * 2 of the License, or (at your option) any later version.
- */
-
-#include <linux/backing-dev.h>
-#include <linux/mtd/mtd.h>
-#include "internal.h"
-
-/*
- * backing device capabilities for non-mappable devices (such as NAND flash)
- * ...
From: Jörn Engel
Date: Tuesday, April 27, 2010 - 4:29 am

Works for me.

Acked-and-tested-by: Joern Engel <joern@logfs.org>

Jörn

-- 
If you're willing to restrict the flexibility of your approach,
you can almost always do something better.
-- John Carmack
--

From: Jens Axboe
Date: Tuesday, April 27, 2010 - 4:33 am

Goodness, thanks for testing. Now we are pretty close to have something
we can send out.

-- 
Jens Axboe

--

From: Paolo Minazzi
Date: Tuesday, April 27, 2010 - 2:01 am

I have tried this patch.
I have enabled LOGFS, but not mounted partition with it.
/dev/mtdblock1 is my romfs root partition and it is OK.

The problem is that init cannot mount my /dev/mtdblock1 romfs root.

This is the fault :
Platform: Cirrus Logic EDB9315A Board (ARM920T) Rev A
Copyright (C) 2000, 2001, 2002, Red Hat, Inc.
       |----------------------------------------------
Raw file loaded 0x00080000-0x001dce6b, assumed entry at
0x00080000-----------------------------------------------------------
RedBoot> exec -s 0x00b00000 -r 0x00a00000 -c "root=/dev/mtdblock1
console=ttyAM console=tty1"--------------------------------
ENTRY=0xc0008000-------------------------------------------------------------------------------------------------------------
LENGTH=0x00300000------------------------------------------------------------------------------------------------------------
BASE_ADDR=0x00080000---------------------------------------------------------------------------------------------------------
Using base address 0x00080000 and length
0x00300000--------------------------------------------------------------------------
Uncompressing Linux... done, booting the kernel.
Linux version 2.6.34-rc5 (root@darkstar) (gcc version 3.4.3) #18 Tue
Apr 27 10:55:55 CEST 2010
CPU: ARM920T [41129200] revision 0 (ARMv4T), cr=c0007177
CPU: VIVT data cache, VIVT instruction cache
Machine: Cirrus Logic EDB9315A Evaluation Board
Memory policy: ECC disabled, Data cache writeback
Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 32512
Kernel command line: root=/dev/mtdblock1 console=ttyAM console=tty1
PID hash table entries: 512 (order: -1, 2048 bytes)
Dentry cache hash table entries: 16384 (order: 4, 65536 bytes)
Inode-cache hash table entries: 8192 (order: 3, 32768 bytes)
Memory: 128MB = 128MB total
Memory: 126928k/126928k available, 4144k reserved, 0K highmem
Virtual kernel memory layout:
    vector  : 0xffff0000 - 0xffff1000   (   4 kB)
    fixmap  : 0xfff00000 - 0xfffe0000 ...
From: Jens Axboe
Date: Tuesday, April 27, 2010 - 2:16 am

So this issue looks unrelated, but a bug none the less (if just enabling
logfs breaks the mount).

-- 
Jens Axboe

--

From: Paolo Minazzi
Date: Tuesday, April 27, 2010 - 2:26 am

To be precise, I say :

- downloaded 2.6.34rc5
- apply the jens patch
- the logfs code is the code in the 2.6.34rc5 (I have no applied other
jorn patch)

Paolo

--

From: Jens Axboe
Date: Tuesday, April 27, 2010 - 2:29 am

(please stop top posting, I fixed this one up for you).

Just to be on the safe side - without the patch, does it work or not?

-- 
Jens Axboe

--

From: Paolo Minazzi
Date: Tuesday, April 27, 2010 - 2:36 am

Without the patch for me means the standard 2.6.34rc5.
In this case :
- enabling logfs I have an oop in fs/write-back.c:157 before kernel calls init.
- disabling logfs I can do normal mount my root /dev/mtdblock1

This is my experience.

Paolo
--

From: Jörn Engel
Date: Tuesday, April 27, 2010 - 4:17 am

If you add "rootfstype=romfs" to the command line, does the problem
still exist?

Jörn

-- 
And spam is a useful source of entropy for /dev/random too!
-- Jasmine Strong
--

From: Paolo Minazzi
Date: Tuesday, April 27, 2010 - 4:31 am

Jorn , you are right.
It seems work....
please wait....
--

From: Jörn Engel
Date: Tuesday, April 27, 2010 - 4:40 am

Ok, I'm pretty sure that logfs returns -EIO where it should return
-EINVAL.  If filesystems are tried in alphabetical order, logfs comes
first and -EIO tells the kernel to stop trying and panic, essentially.

Will cook up a patch...

Jörn

-- 
A defeated army first battles and then seeks victory.
-- Sun Tzu
--

From: Jörn Engel
Date: Tuesday, April 27, 2010 - 4:48 am

Does the patch below solve the problem for you (without the explicit
"rootfstype=romfs")?

Jörn

-- 
One of my most productive days was throwing away 1000 lines of code.
-- Ken Thompson.


Signed-off-by: Joern Engel <joern@logfs.org>
---
 fs/logfs/super.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/logfs/super.c b/fs/logfs/super.c
index 5866ee6..f649038 100644
--- a/fs/logfs/super.c
+++ b/fs/logfs/super.c
@@ -413,7 +413,7 @@ static int __logfs_read_sb(struct super_block *sb)
 
 	page = find_super_block(sb);
 	if (!page)
-		return -EIO;
+		return -EINVAL;
 
 	ds = page_address(page);
 	super->s_size = be64_to_cpu(ds->ds_filesystem_size);
-- 
1.6.2.1

--

From: Paolo Minazzi
Date: Tuesday, April 27, 2010 - 5:53 am

The above patch solve the problem for me.
Now I can mount root without rootfstype=romfs

Bye,
Paolo
--

From: Paolo Minazzi
Date: Tuesday, April 27, 2010 - 4:54 am

Ok, now it is very better.
My problem now is that logfs does not save files in the following case:
1) power-off, then power-on (without umount)
2) sync, then power-off, then power-on (without umount)

Using umount sometimes seems works, sometimes I have the following oop :

~ # mount -t logfs /dev/mtdblock7 /mitrolbackup
~ # cd /mitrolbackup
/mitrolbackup # echo ciao > ciao
/mitrolbackup # echo pino > pino
/mitrolbackup # ls
ciao  pino
/mitrolbackup # cd ..
~ # umount /mitrolbackup/
kernel BUG at fs/logfs/readwrite.c:1976!
Unable to handle kernel NULL pointer dereference at virtual address 00000000
pgd = c7db0000
[00000000] *pgd=c7dac031, *pte=00000000, *ppte=00000000
Internal error: Oops: 817 [#1]
last sysfs file:
Modules linked in:
CPU: 0    Not tainted  (2.6.34-rc5 #19)
PC is at __bug+0x20/0x2c
LR is at release_console_sem+0x1b8/0x1ec
pc : [<c003a1ac>]    lr : [<c004760c>]    psr: 60000013
sp : c7d5fe60  ip : c7d5fd98  fp : c7d5fe6c
r10: 00000003  r9 : 00000000  r8 : c7d5feb4
r7 : 00000003  r6 : c7d5feb4  r5 : c7819878  r4 : c7819870
r3 : 00000000  r2 : 60000013  r1 : 000019c2  r0 : 0000002f
Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
Control: c000717f  Table: c7db0000  DAC: 00000015
Process umount (pid: 307, stack limit = 0xc7d5e260)
Stack: (0xc7d5fe60 to 0xc7d60000)
fe60: c7d5fe7c c7d5fe70 c00fb2f0 c003a19c c7d5fe90 c7d5fe80 c00ad358 c00fb274
fe80: c7819870 c7d5feb0 c7d5fe94 c00ad3e4 c00ad2bc c7d46a64 c7d46a54 c7819880
fea0: c7d46a64 c7d5fee4 c7d5feb4 c00ad5a8 c00ad3d0 c7d5feb4 c7d5feb4 c7d46a00
fec0: c7d5e000 c022a324 00000000 c7d5ff54 c7d2b880 00000000 c7d5ff00 c7d5fee8
fee0: c009c8ec c00ad4c0 c7d46a00 c7c14000 c7d2b880 c7d5ff1c c7d5ff04 c00feb38
ff00: c009c84c c7d46a00 c02b7b84 c7d2b880 c7d5ff34 c7d5ff20 c009c740 c00feb00
ff20: c7d2b880 c7d46a00 c7d5ff4c c7d5ff38 c00b0bd0 c009c708 00000000 c7d2b898
ff40: c7d5ff94 c7d5ff50 c00b1e90 c00b0b7c 00000000 c7d5ff54 c7d5ff54 c7d5ff5c
ff60: c7d5ff5c c7d2b880 c7817500 be8e3e0c 00000000 be8e5fb4 ...
From: Jörn Engel
Date: Tuesday, April 27, 2010 - 5:05 am

Ok, I'll have a look.

Can we move this and any other logfs problems to a seperate thread?  I'd
like to leave this one for the sync issues and not spam everyone on Cc:
with unrelated bugs. :)

Jörn

-- 
Don't patch bad code, rewrite it.
-- Kernigham and Pike, according to Rusty
--

From: Jens Axboe
Date: Monday, April 26, 2010 - 10:01 am

Leftover debugging?

-- 
Jens Axboe

--

From: Jörn Engel
Date: Monday, April 26, 2010 - 10:08 am

Yes.  Can you just remove that hunk or should I resend?

Jörn

-- 
"Error protection by error detection and correction."
-- from a university class
--

From: Jens Axboe
Date: Monday, April 26, 2010 - 10:10 am

I can kill that hunk, no problem.

-- 
Jens Axboe

--

From: Jens Axboe
Date: Thursday, April 22, 2010 - 2:03 am

It's funky, I was pretty sure there was/is code to set a default bdi for
non-bdev file systems. It appears to be missing, that's not good. So
options include:

- Add the appropriate per-sb bdi for these file systems (right fix), or
- Pre-fill default_backing_dev_info as a fallback ->s_bdi to at least
  ensure that data gets flushed (quick fix)

I'll slap together a set of fixes for this.

-- 
Jens Axboe

--

From: Jens Axboe
Date: Thursday, April 22, 2010 - 3:39 am

Here's a series for fixing these. At this point they are totally
untested except that I did compile them. Note that your analysis
appeared correct for all cases but ocfs2, which does use get_sb_bdev()
and hence gets ->s_bdi assigned.

You can see them here, I'll post the series soon:

http://git.kernel.dk/?p=linux-2.6-block.git;a=shortlog;h=refs/heads/for-linus

The first patch is a helper addition, the rest are per-fs fixups.

-- 
Jens Axboe

--

From: David Woodhouse
Date: Thursday, April 22, 2010 - 3:58 am

Do you want to include Jörn's addition of same to get_sb_mtd_set(), with
my Acked-By: David Woodhouse <David.Woodhouse@intel.com> ? 

-- 
David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation

--

From: Jens Axboe
Date: Thursday, April 22, 2010 - 4:05 am

Yeah will do. I also really wanted to provide a WARN and mount fail if
we get it wrong, but I don't see an easy way to do that. Basically I'd
want to check whether the storage backing is volatile or not.

-- 
Jens Axboe

--

From: Jörn Engel
Date: Thursday, April 22, 2010 - 4:55 am

Looks good at a cursory glance.  What's still missing is some sort of
assertion.  You are a smart person and missed this problem, twice even.
Even if you hadn't, a not so smart person can add a new filesystem and
miss s_bdi, like I did.  We want some automatism to catch this.

Jörn

-- 
"Security vulnerabilities are here to stay."
-- Scott Culp, Manager of the Microsoft Security Response Center, 2001
--

From: Jens Axboe
Date: Thursday, April 22, 2010 - 5:08 am

I totally agree, we want some way to catch this problem in the future.
Really the check needs to be something ala:

        if (!sb->s_bdi && mnt_has_storage_backing() && rw_mount)
                yell_and_fail;

but I'm not sure how best to accomplish that. We can check for ->s_bdev
and mtd like you did, but that does not catch network file systems for
instance.

-- 
Jens Axboe

--

From: Jörn Engel
Date: Thursday, April 22, 2010 - 5:17 am

One way would be to either add a flag to all safe filesystems or create
a noop_bdi for them.  It adds a line of code and some bytes[*] per
instance to most filesystems, but that's the only catchall I can think
of.

I guess if noone comes up with a better plan I'll look into that.

[*] Maybe we can steal a bit somewhere to make it less expensive.

Jörn

-- 
You can't tell where a program is going to spend its time. Bottlenecks
occur in surprising places, so don't try to second guess and put in a
speed hack until you've proven that's where the bottleneck is.
-- Rob Pike
--

Previous thread: Re: [PATCH] usb: core: config.c: usb_get_configuration() simplified by Alan Stern on Saturday, April 17, 2010 - 9:13 am. (1 message)

Next thread: [PATCH 5/7] viafb: improve misc register handling by Florian Tobias Schandinat on Saturday, April 17, 2010 - 12:44 pm. (1 message)