Andrew Morton [interview] posted on the lkml, "In 2.4.20-pre5 an optimisation was made to the ext3 fsync function which can very easily cause file data corruption at unmount time". This bug only affects people using ext3 in the uncommon "data=journal" mode, or files operating under "chattr -j", and does not affect the 2.5 series of kernels.
Andrew went on to say that "The symptoms are that any file data which was written within the thirty seconds prior to the unmount may not make it to disk. A workaround is to run `sync' before unmounting". He also posted a patch to fix the problem. However, soon thereafter, he posted saying that "that 'fix' didn't fix it. Sorry about that". Until a proper fix can be developed, he recommends that people "please avoid ext3/data=journal". Since "data=journal" is not the default ext3 mode, it is unlikely most people running ext3 will be affected by this. However, it is a data corruption bug so you should double-check that you use either "data=ordered" or "data=writeback" as your ext3 mode of operation.
From: Andrew Morton
To: linux-kernel Mailing List
Subject: data corrupting bug in 2.4.20 ext3, data=journal
Date: Sun Dec 01 2002 - 03:11:41 EST
In 2.4.20-pre5 an optimisation was made to the ext3 fsync function
which can very easily cause file data corruption at unmount time. This
was first reported by Nick Piggin on November 29th (one day after 2.4.20 was
released, and three months after the bug was merged. Unfortunate timing)
This only affects filesystems which were mounted with the `data=journal'
option. Or files which are operating under `chattr -j'. So most people
are unaffected. The problem is not present in 2.5 kernels.
The symptoms are that any file data which was written within the thirty
seconds prior to the unmount may not make it to disk. A workaround
is to run `sync' before nmounting.
The optimisation was intended to avoid writing out and waiting on the
inode's buffers when the subsequent commit would do that anyway. This
optimisation was applied to both data=journal and data=ordered modes.
But it is only valid for data=ordered mode.
In data=journal mode the data is left dirty in memory and the unmount
will silently discard it.
The fix is to only apply the optimisation to inodes
which are operating under data=ordered.
--- linux-akpm/fs/ext3/fsync.c~ext3-fsync-fix Sat Nov 30 23:37:33 2002
+++ linux-akpm-akpm/fs/ext3/fsync.c Sat Nov 30 23:39:30 2002
@@ -63,10 +63,12 @@ int ext3_sync_file(struct file * file, s
*/
ret = fsync_inode_buffers(inode);
- /* In writeback mode, we need to force out data buffers too. In
- * the other modes, ext3_force_commit takes care of forcing out
- * just the right data blocks. */
- if (test_opt(inode->i_sb, DATA_FLAGS) == EXT3_MOUNT_WRITEBACK_DATA)
+ /*
+ * If the inode is under ordered-data writeback it is not necessary to
+ * sync its data buffers here - commit will do that, with potentially
+ * better IO merging
+ */
+ if (!ext3_should_order_data(inode))
ret |= fsync_inode_data_buffers(inode);
ext3_force_commit(inode->i_sb);
_From: Andrew Morton
To: linux-kernel Mailing List
Subject: Re: data corrupting bug in 2.4.20 ext3, data=journal
Date: Sun Dec 01 2002 - 03:52:23 EST
Andrew Morton wrote:> > ... > The fix is to only apply the optimisation to inodes which are operating > under data=ordered. > That "fix" didn't fix it. Sorry about that. Please avoid ext3/data=journal until it is sorted out.
Most people are unaffected?
One of the biggest selling points of ext3 was that is journaled both data and metadata. That's why I use it. To downplay this like no one uses that mode is a big mistake. Surely the release should be pulled or the patch rolled back or SOMETHING.
My honest opinion
Marcelo screwed up with the last 2 releases, I was joyious when he got the job, fresh blood and all, now I've learned to fear his decision making skills.
2.4.19 introduced the horrible lag bug, 2.4.20 didn't fix it for what I know.. and now this horrible horrible corruption bug..
-edit-
It gets worse
http://www.uwsg.indiana.edu/hypermail/linux/kernel/0212.0/0028.html
According to the original bug reporter the bug predates 2.4.19-final...
Look on the bright side
This corruption bug is about the least horrible a corruption bug can possibly be.
Give Marcelo a Break
The last two release diffs were huge even when bzipped. Saying that he "screwed up" one out of maybe a thousand patch decisions is really nit picking.
Yeah well
Considering he is using a release model (release candidates) lightyears ahead of those before him (for stable releases), he really is making a hash running it. Look, how many rcs come out each release? 5? What you really have to do is keep stability foremost in mind the ENTIRE way through the release and plan for 1 (ONE) release candidate, not use it so you can be lazy and have it (the release candiate model) cover your bum. It just breaks down and is no better than releasing un (widely) tested patches like previous management. A lot more people will test it if there is a good chance it will be the same as the release version. Look at most any other (respected) software project using this model.
RE: Yeah well
And that is not to say he shouldn't, say, merge a big IDE update from Alan at the start of a release, especially when it has been in Alan's tree for x months, just that he should _plan_ for 1 release candidate.
WHAT?
I personally believe that the IDE merge should have stayed in Alan' kernel, but that's besides the point.
With 2.4.19 two serious bugs were introduced, and they were not fixed in 2.4.20. The lag bug alone has forced me to use the 2.4.18 based WOLK kernel untill there was a fix available. Since I never used ext3 much, since I pref. Reiserfs (ever since it got stable) it's faster and I've experienced less corruption problems with ReiserFS than I ever did with ext3.
I'm not saying that Marcelo has to go, but I don't see the reason for big merges like the IDE one in a stable kernel - I didn't see it fixing any problems nor adding some critically needed feature - why risk that in a stable kernel?
There has to be a defining line for stable kernels, leave the dangerous merges for Alan and all the other patchset creators, nearly nobody uses a vanilla kernel afterall, thus the stable kernel should be a death stable base for vendors.
And seriously Marcelos RCs ain't new, Linus did the same thing - the later pres would be bugfixes only.
Unfortunately
The old IDE stuff did have its share of problems and it wasn't like a total rewrite or anything. You still need to add capability for new devices as they come out in the stable kernel. I have never had problems with ext3 personally except that bug (which I reported - I started this whole mess!). Anyway, the RCs are new, not so much for putting less stuff into later patches, but having no chances between the last rc and release.
That said, the IDE merge should have waited a couple of revisions, and some vm stuff should probably have gone in instead for 2.4.20.
lag bug???
Any url or reference on this? This is the first I've heard of it.
Ok then
It's a bug that causes pausing under load. I'm not good at explaining it, but when using X and this happens it's like really bad lagging in an online game (hence I call it the lag bug - I heard it be called pauses from hell bug, etc.)
I don't know what's causing this bug, but fact is that it's in 2.4.19 and not in 2.4.18. Might be IDE, might be some VM shit, might be something else... end result, laggy behavior on some machines.
I know Con Kolivas has had reports on pausing on his ck patchset featuring the compressed cache patch. It seems to be the same bug but in a different context, I don't think it's CC that causes this, because I haven't seen this bug with CC on 2.4.18 (WOLK)
Here's a link that seems to be about a possible fix for it, by the grand master himself... Marc-Christian Peterson.
http://www.uwsg.indiana.edu/hypermail/linux/kernel/0211.2/0066.html
IO scheduler
The problem is reads being starved I think. The problem is being solved in 2.5 with the deadline IO scheduler. The scheduler in 2.4 starves reads really badly. There hasn't been a (quick) fix for it which is agreeable to all parties. read-latency2 works well and is in ac, but Andrea doesn't like it, neither does Jens!
maybe
Then why have I yet to see this "bug" in 2.4.18 if it's an IO scheduler bug - even when I use the backported scheduler in 2.4.19 (JP and CK amongst others) the bug is still present. (Maybe I'm thinking of a different scheduler, I'm really not much of a kernel developer)
I agree than read starvation seems to be a factor in this problem, but I doubt that it's the whole and entire cause, but I'm glad that there's a possible fix out, be it an ugly fix.
yes
The IO scheduler, not the process scheduler. The fix (both read-latency2 and andrea's fix) both change the IO scheduler so reads don't get starved for as long.
lag bug
I am not sure what the bug actually is but I seem to have it. Whenever I run 2.4.x kernels and do large cvs updates, or compiles, my system starts lagging so badly that mouse movement is even jerky/nonexistant. I started running 2.5.x to get away from the issue.
Chris Cheney
ccheney@cheney.cx
Rollback
You could rollback to an earlier kernel version, you know.
This sort of thing is why I treat recently released kernels as 'unstable'. There have been a number of cases where the so-called stable branch has had gotchas in the code.
Not recent
Define "Recently released kernels". Take a look at how long it was since 2.4.20 was released, for an example.
Incorrect patch
Oh and by the way, the patch does NOT fix the problem.
Uh huh
Oh, so that's what "That "fix" didn't fix it. Sorry about that." means. ;-)
ext3?
I didn't think anyone still used ext* except perhaps those upgrading legacy servers. ext3 is really slow in my experience compared to the more modern file systems like ReiserFS (which had its fair share of issues, but a long time ago). ReiserFS seems to be the default now in most distros anyway. When I have to pay through the nose for my storage hardware, I like to know that my file system provides stability and performance to match it. Can't wait for Reiser4!
Reiser Default!?
I really don't think so! Maybe in Mandrake, but not Debian, RedHat or Gentoo. ReiserFS has it's own history and baggage. If you like it - I won't argue the point. ext3 is demostrably faster on any number of real-world tasks, and is getting 300% perf boost in the newest 2.5 series.
I use the SGI XFS, knowing it to e Beta-quality. This means backups via Duplicity/rsync and EVMS/OpenAFS snapshots. I'm an old Irix-er, and I have my homedirs in /usr/people, too...
Backup
xfsdump couldn't do the job? Won't anything like Backup Exec, ARCServe do the trick? Sorry for being off topic, but I'm looking into backing up all permissions and ACLs for Samba share on XFS.
I use it
I use it because it is backwards compatible with ext2, offers data journaling, journal on a seperate device, and is very stable and robust IMO regardless of this bug.
ext3 - journaling upgrade on the cheap :)
I wanted to go with Reiser, but I've got a ton of data I don't want to lose and I can neither afford a back up drive or another hard drive to dump the data before converting. Ah well, I guess Christmas is coming up... ;)
Well
I don't know what sort of system you're running, but here, ext3+htree performs very nearly as well as reiser, and it's a lot easier to support :)
sync before unmount
im 'sync'inf before 'unmount'ing on linux every time, maybe im a bit overparanoid... but semms it is a good idea in general...
i did this scince i first booted into freebsd and say this behaviour... so the sync is in place in my bsd init script on linux, too..
besides its usual that even stable kernels (whatever os) still contains lot of riscs arinsing soon after the release... so ppl should always update fast and whats more important: do backups of every important data u have. (and no: no .tar.gz on the same partition ;-)
Eugene
Congratulations
Posted to Slashdot. Twice. Maybe they forgot to sync the first one to disk.
Bug Stomping
I'm curious to hear if anybody has any specific ideas on how this bug got out in a 'stable' release. Is it a case of too many patches merged, not enough time spent testing, etc. I think its important that we consider any procedural changes to ensure that point releases are as stable as possible. .20 is pretty late in the game to have this type of error IMO and I'd like know what could be done to prevent further flubs.
Nah
Its more a matter of nobody much gets affected by it. It had been reported on lkml before 2.4.19 but nobody really noticed. Mostly you'd only be unmounting an ext3 disk before a reboot and I think most init scripts sync beforehand.
Correct patch
Has the bug already been fixed? If not, what's taking them so long? I want to upgrade to 2.4.20!
Fix
Andrew Morton posted a new fix.
How good is that patch? Does
How good is that patch? Does it solve the problem completely? I'm not sure after reading his email.
Help: I've suffered this Data corrupting ext3 bug in 2.4.20
Everything was going smooth... until I rebooted. nooooo.......
This is the error message I got:
fsck.ext3: Invalid argument : couldn't load ext3 journal for /dev/hda3
So how am I meant to repair this problem? How can I boot Linux?