Chris Mason announced version 0.10 of his new Btrfs filesystem, listing the following new features, "explicit back references, online resizing (including shrinking), in place conversion from Ext3 to Btrfs, data=ordered support, mount options to disable data COW and checksumming, and barrier support for sata and IDE drives". He noted that the disk format in v0.10 has changed, and is not compatible with the v0.9 disk format. Regarding back reference support, Chris explained, "the core of this release is explicit back references for all metadata blocks, data extents, and directory items. These are a crucial building block for future features such as online fsck and migration between devices. The back references are verified during deletes, and the extent back references are checked by the existing offline fsck tool." He then detailed the new Ext3 to Btrfs conversion utility:
"The conversion program uses the copy on write nature of Btrfs to preserve the original Ext3 FS, sharing the data blocks between Btrfs and Ext3 metadata. Btrfs metadata is created inside the free space of the Ext3 filesystem, and it is possible to either make the conversion permanent (reclaiming the space used by Ext3) or roll back the conversion to the original Ext3 filesystem."
"This patch speeds up e2fsck on Ext3 significantly using a technique called Metaclustering," stated Abhishek Rai. In an earlier thread he quantified this claim, "this patch will help reduce full fsck time for ext3. I've seen 50-65% reduction in fsck time when using this patch on a near-full file system. With some fsck optimizations, this figure becomes 80%." Most criticism so far has been in regards to formatting issues with the patch preventing it from being easily tested, resolved in the latest postings. It was also cautioned that the patch affects a significant amount of ext3 code, and thus will require very heavy testing. Abhishek described how the patch offers its significant gains for e2fsck:
"Metaclustering refers to storing indirect blocks in clusters on a per-group basis instead of spreading them out along with the data blocks. This makes e2fsck faster since it can now read and verify all indirect blocks without much seeks. However, done naively it can affect IO performance, so we have built in some optimizations to prevent that from happening. Finally, the benefit in fsck performance is noticeable only when indirect block reads are the bottleneck which is not always the case, but quite frequently is, in the case of moderate to large disks with lot of data on them. However, when indirect block reads are not the bottleneck, e2fsck is generally quite fast anyway to warrant any performance improvements."
"I've just released the 2.6.23-rc9-ext4-1. It collapses some patches in preparation for pushing them to Linus, and adds some of the cleanup patches that had been incorporated into Andrew's broken-out-2007-10-01-04-09 series," announced Theodore Ts'o. He also noted of the current ext4 git tree, "it also has some new development patches in the unstable (not yet ready to push to mainline) portion of the patch series." In an earlier thread Theodore posted a series of patches specifically intended for inclusion in the upcoming 2.6.24 kernel. Included in the patch series was a patch for improving fsck performance, "in performance tests testing e2fsck time, we have seen that e2fsck time on ext3 grows linearly with the total number of inodes in the filesytem. In ext4 with the uninitialized block groups feature, the e2fsck time is constant, based solely on the number of used inodes rather than the total inode count." The patch included an explanation of how the feature works, enabled through a mkfs option:
"With this feature, there is a a high water mark of used inodes for each block group. Block and inode bitmaps can be uninitialized on disk via a flag in the group descriptor to avoid reading or scanning them at e2fsck time. A checksum of each group descriptor is used to ensure that corruption in the group descriptor's bit flags does not cause incorrect operation."
"In [the first pass] of e2fsck, every inode table in the fileystem is scanned and checked, regardless of whether it is in use," Avantika Mathur began. "This is the most time consuming part of the filesystem check. The unintialized block group feature can greatly reduce e2fsck time by eliminating checking of uninitialized inodes." She went on to explain how it works, "with this feature, there is a a high water mark of used inodes for each block group. Block and inode bitmaps can be uninitialized on disk via a flag in the group descriptor to avoid reading or scanning them at e2fsck time. A checksum of each group descriptor is used to ensure that corruption in the group descriptor's bit flags does not cause incorrect operation." Avantika attached a graph illustrating the advantage of the patch which she summarized as follows:
"The patches have been stress tested with fsstress and fsx. In performance tests testing e2fsck time, we have seen that e2fsck time on ext3 grows linearly with the total number of inodes in the filesytem. In ext4 with the uninitialized block groups feature, the e2fsck time is constant, based solely on the number of used inodes rather than the total inode count. Since typical ext4 filesystems only use 1-10% of their inodes, this feature can greatly reduce e2fsck time for users. With performance improvement of 2-20 times, depending on how full the filesystem is."
Chris Mason announced an early alpha release of his new Btrfs filesystem, "after the last FS summit, I started working on a new filesystem that maintains checksums of all file data and metadata." He listed the following features as "mostly implemented": "extent based file storage (2^64 max file size), space efficient packing of small files, space efficient indexed directories, dynamic inode allocation, writable snapshots, subvolumes (separate internal filesystem roots), checksums on data and metadata (multiple algorithms available), very fast offline filesystem check". He listed the following features as yet to be implemented: "object level mirroring and striping, strong integration with device mapper for multiple device support, online filesystem check, efficient incremental backup and FS mirroring". Regarding the current state of the project, Chris said:
"The current status is a very early alpha state, and the kernel code weighs in at a sparsely commented 10,547 lines. I'm releasing now in hopes of finding people interested in testing, benchmarking, documenting, and contributing to the code. I've gotten this far pretty quickly, and plan on continuing to knock off the features as fast as I can. Hopefully I'll manage a release every few weeks or so. The disk format will probably change in some major way every couple of releases."