Re: [PATCH 2/2] ext4: Clear the EXT4_EOFBLOCKS_FL flag only when warranted

Previous thread: Re: [PATCH 13/35] fallthru: ext2 fallthru support by Jan Blunck on Monday, April 19, 2010 - 5:40 am. (5 messages)

Next thread: [PATCH 1/2] ext4: Use bitops to read/modify EXT4_I(inode)->i_flags by Dmitry Monakhov on Monday, April 19, 2010 - 7:32 am. (8 messages)
From: Dmitry Monakhov
Date: Monday, April 19, 2010 - 7:32 am

- Fix NULL pointer deference on error path
- Extent header we found may be not latest node of the inode. In order to
  find latest extent we have to traverse a path from very beginning.

https://bugzilla.kernel.org/show_bug.cgi?id=15792

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
---
 fs/ext4/extents.c |   50 +++++++++++++++++++++++++++++++++++++++-----------
 1 files changed, 39 insertions(+), 11 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 7c7b1d5..13758c3 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -3327,7 +3327,7 @@ int ext4_ext_get_blocks(handle_t *handle, struct inode *inode,
 {
 	struct ext4_ext_path *path = NULL;
 	struct ext4_extent_header *eh;
-	struct ext4_extent newex, *ex, *last_ex;
+	struct ext4_extent newex, *ex;
 	ext4_fsblk_t newblock;
 	int err = 0, depth, ret, cache_type;
 	unsigned int allocated = 0;
@@ -3510,17 +3510,38 @@ int ext4_ext_get_blocks(handle_t *handle, struct inode *inode,
 	}
 
 	if (unlikely(ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS))) {
-		if (unlikely(!eh->eh_entries)) {
-			EXT4_ERROR_INODE(inode,
-					 "eh->eh_entries == 0 ee_block %d",
-					 ex->ee_block);
-			err = -EIO;
-			goto out2;
-		}
-		last_ex = EXT_LAST_EXTENT(eh);
+		struct ext4_extent *last_ex = EXT_LAST_EXTENT(eh);
+		/*
+		 * Optimization: Extent header we found may not be the latest
+		 * extent of the inode, but it is already in-core memory.
+		 * Let's test against this EH to avoid unecessery IO.
+		*/
+		if (unlikely(!eh->eh_entries))
+			goto bad_eh;
 		if (iblock + ar.len > le32_to_cpu(last_ex->ee_block)
-		    + ext4_ext_get_actual_len(last_ex))
-			ext4_clear_inode_flag(inode, EXT4_INODE_EOFBLOCKS);
+			+ ext4_ext_get_actual_len(last_ex)) {
+			struct ext4_ext_path *path2;
+			struct ext4_extent_header *eh2;
+			/* Real search of the latests extent is necessery */
+			path2 = ext4_ext_find_extent(inode, EXT_MAX_BLOCK, NULL);
+			eh2 = path2[depth].p_hdr;
+			if (IS_ERR(path2)) ...
From: tytso
Date: Monday, May 24, 2010 - 9:17 pm

I split this patch into two patches, since they are addressing two
very distinct and different bugs.  As it turns out I chose a
completely different way of tackling the second issue, which has the
advantage being much simpler, and not requiring a second call to
ext4_ext_find_extent(), which can end up requiring extra disk reads.

Also note that I supplied a test case to demonstrate the problem.
This was helpful in assuring that the problem was fixed, and also in
proving that there really *was* a problem; as it turns out triggering
it is quite difficult and I would be very surprised if it has really
happenned in real life.

To construct the test case I first of all used a 1k block file system,
and then generated an extremely fragmented free space:

mkdir a; cd a
seq -f %05.0f 1 65536 | xargs touch
seq -f %05.0f 1 65536 | xargs -L 1 fallocate -l 1k
seq -f %05.0f 1 2 65536 | xargs rm
cd ..

I then created the fragmented file with the EOFBLOCKS set:

fallocate -n -l 32m foo

This should generate a file with a two-deep extent tree.  (It is
otherwise *very* hard to create a deep extent tree.)  I then found the
last block in an leaf block in the middle of the tree, and deleted the
last extent in that leaf block, using the tst_extents program found in
lib/ext2fs in the e2fsprogs sources (it is built using "make check").
In the case described in the commit, this happened to be for the
logical block 17925.

Could such a file be generated in real life?  Yes, but you'd have to
be quite unlucky, as it would require extending a sparse, fragmented
file using an i_size-preserving fallocate call, where there was a hole
at precisely the right (wrong) place in the extent tree.

			      	      	 - Ted
--

From: Theodore Ts'o
Date: Monday, May 24, 2010 - 9:18 pm

If the EOFBLOCK_FL flag is set when it should not be and the inode is
zero length, then eh_entries is zero, and ex is NULL, so dereferencing
ex to print ex->ee_block causes a kernel OOPS in
ext4_ext_map_blocks().

On top of that, the error message which is printed isn't very helpful.
So we fix this by printing something more explanatory which doesn't
involve trying to print ex->ee_block.

Addresses-Google-Bug: #2655740

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
---
 fs/ext4/extents.c |    9 +++++----
 1 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 595ab6d..aeec5c7 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -3370,8 +3370,9 @@ int ext4_ext_map_blocks(handle_t *handle, struct inode *inode,
 	 */
 	if (unlikely(path[depth].p_ext == NULL && depth != 0)) {
 		EXT4_ERROR_INODE(inode, "bad extent address "
-				 "iblock: %d, depth: %d pblock %lld",
-				 map->m_lblk, depth, path[depth].p_block);
+				 "lblock: %lu, depth: %d pblock %lld",
+				 (unsigned long) map->m_lblk, depth,
+				 path[depth].p_block);
 		err = -EIO;
 		goto out2;
 	}
@@ -3501,8 +3502,8 @@ int ext4_ext_map_blocks(handle_t *handle, struct inode *inode,
 	if (unlikely(ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS))) {
 		if (unlikely(!eh->eh_entries)) {
 			EXT4_ERROR_INODE(inode,
-					 "eh->eh_entries == 0 ee_block %d",
-					 ex->ee_block);
+					 "eh->eh_entries == 0 and "
+					 "EOFBLOCKS_FL set");
 			err = -EIO;
 			goto out2;
 		}
-- 
1.6.6.1.1.g974db.dirty

--

From: Theodore Ts'o
Date: Monday, May 24, 2010 - 9:18 pm

Dimitry Monakhov discovered an edge case where it was possible for the
EXT4_EOFBLOCKS_FL flag could get cleared unnecessarily.  This is true;
I have a test case that can be exercised via downloading and
decompressing the file:

wget ftp://ftp.kernel.org/pub/linux/kernel/people/tytso/ext4-testcases/eofblocks-fl-test-ca...
bunzip2 eofblocks-fl-test-case.img
dd if=/dev/zero of=eofblocks-fl-test-case.img bs=1k seek=17925 bs=1k count=1 conv=notrunc

However, triggering it in real life is highly unlikely since it
requires an extremely fragmented sparse file with a hole in exactly
the right place in the extent tree.  (It actually took quite a bit of
work to generate this test case.)  Still, it's nice to get even
extreme corner cases to be correct, so this patch makes sure that we
don't clear the EXT4_EOFBLOCKS_FL incorrectly even in this corner
case.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
---
 fs/ext4/extents.c |   18 +++++++++++++++---
 1 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index aeec5c7..c7c304f 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -3319,7 +3319,7 @@ int ext4_ext_map_blocks(handle_t *handle, struct inode *inode,
 	struct ext4_extent_header *eh;
 	struct ext4_extent newex, *ex, *last_ex;
 	ext4_fsblk_t newblock;
-	int err = 0, depth, ret, cache_type;
+	int i, err = 0, depth, ret, cache_type;
 	unsigned int allocated = 0;
 	struct ext4_allocation_request ar;
 	ext4_io_end_t *io = EXT4_I(inode)->cur_aio_dio;
@@ -3508,8 +3508,20 @@ int ext4_ext_map_blocks(handle_t *handle, struct inode *inode,
 			goto out2;
 		}
 		last_ex = EXT_LAST_EXTENT(eh);
-		if (map->m_lblk + ar.len > le32_to_cpu(last_ex->ee_block)
-		    + ext4_ext_get_actual_len(last_ex))
+		/*
+		 * If the current leaf block was reached by looking at
+		 * the last index block all the way down the tree, and
+		 * we are extending the inode beyond the last extent
+		 * in the current leaf block, then clear the
+		 ...
From: Dmitry Monakhov
Date: Tuesday, May 25, 2010 - 12:23 am

This condition was triggered during fsstress test. So I consider
it as rare but possible in real life. Nor than less it is better
to fix it now, than fix it in response from a midnight call from some
--

From: tytso
Date: Tuesday, May 25, 2010 - 6:03 am

Fair enough, I just didn't have an 8-core system handy when I was
dealing with it, so hand-crafted an fs image.

BTW, whatever happened to the xfstests patch that you created in

https://bugzilla.kernel.org/show_bug.cgi?id=15792

Did you submit it to the xfstests folks for inclusion?

    	       	     	 	  	- Ted
--

From: Dmitry Monakhov
Date: Tuesday, May 25, 2010 - 6:12 am

Yes but changes was requested, so i've prepared new version.
I'll post it against Jan's xfsqa quota branch.
--

From: tytso
Date: Tuesday, May 25, 2010 - 6:15 am

Is there a reason you don't just base it against the xfstests mainline
and send the patch directly to xfs@oss.sgi.com?  Your patch isn't
related to quota or has any dependency on other changes that might be
in Jan's branch, right?  Or am I missing something?

   	 	    	 	 	       	 - Ted


--

From: Dmitry Monakhov
Date: Tuesday, May 25, 2010 - 6:19 am

They are related because test number always increasing, and i want to
use the test to run with and w/o quota enabled.
--

Previous thread: Re: [PATCH 13/35] fallthru: ext2 fallthru support by Jan Blunck on Monday, April 19, 2010 - 5:40 am. (5 messages)

Next thread: [PATCH 1/2] ext4: Use bitops to read/modify EXT4_I(inode)->i_flags by Dmitry Monakhov on Monday, April 19, 2010 - 7:32 am. (8 messages)
<