"Btrfs v0.16 is available for download," began Chris Mason, announcing the latest release of his new Btrfs filesystem. He noted, "v0.16 has a shiny new disk format, and is not compatible with filesystems created by older Btrfs releases. But, it should be the fastest Btrfs yet, with a wide variety of scalability fixes and new features." Improved scalability and performance improvements include fine grained btree locking, pushing CPU intensive operations such as checksumming into their own background threads, improved
data=ordered mode, and a new cache to reduce IO requirements when cleaning up old transactions. Other new features include support for ACLs, prevention of orphaned inodes so files won't be lost after a crash, and a more robust directory index format. Chris noted:
"There are still more disk format changes planned, but we're making every effort to get them out of the way as quickly as we can. You can see the major features we have planned on the development timeline. [...] the btrfs kernel module now weighs in at 30,000 LOC, which means we're getting very close to the size of ext."
"I think our mistake is the assumption that everyone who wants to contribute to the kernel wants to code, or that everyone wants to end up at the top of the major feature contributors list. For lots of people, we're going to be that experiment from college they never quite want to admit to later on in life."
"Btrfs v0.14 is now available for download," Chris Mason announced, adding, "please note the disk format has changed, and it is not compatible with older versions of Btrfs." The project has gained a new wiki home page on the kernel.org domain, where it is explained, "Btrfs is a new copy on write filesystem for Linux aimed at implementing advanced features while focusing on fault tolerance, repair and easy administration. Initially developed by Oracle, Btrfs is licensed under the GPL and open for contribution from anyone." Regarding the latest release, Chris explained:
"v0.14 has a few performance fixes and closes some races that could have allowed corrupted metadata in v0.13. The major new feature is the ability to manage multiple devices under a single Btrfs mount. Raid0, raid1 and raid10 are supported. Even for single device filesystems, metadata is now duplicated by default. Checksums are verified after reads finish and duplicate copies are used if the checksums don't match."
Chris offered links to multi-device benchmarks summarizing, "in general these numbers show that Btrfs does a good job at scaling to this storage configuration, and that is it on par with both HW raid and MD." Looking forward, he concluded, "next up on the Btrfs todo list is finishing off the device removal and IO error handling code. After that I'll add more fine grained locking to the btrees."
"There are lots of things in the FS that need deep thought,and the perfect system to fully use the first 64k of a 1TB filesystem isn't quite at the top of my list right now."
"I wasn't planning on releasing v0.12 yet, and it was supposed to have some initial support for multiple devices. But, I have made a number of performance fixes and small bug fixes, and I wanted to get them out there before the (destabilizing) work on multiple-devices took over," explained Chris Mason regarding the 0.12 release of his new btrfs filesytem. Btrfs was first announced in June of 2007, as an alpha-quality filesystem offering checksumming of all files and metadata, extent based file storage, efficient packing of small files, dynamic inode allocation, writable snapshots, object level mirroring and striping, and fast offline filesystem checks, among other features. The project's website explains, "Linux has a wealth of filesystems to choose from, but we are facing a number of challenges with scaling to the large storage subsystems that are becoming common in today's data centers. Filesystems need to scale in their ability to address and manage large storage, and also in their ability to detect, repair and tolerate errors in the data stored on disk." Regarding the latest release, Chris offered:
"So, here's v0.12. It comes with a shiny new disk format (sorry), but the gain is dramatically better random writes to existing files. In testing here, the random write phase of tiobench went from 1MB/s to 30MB/s. The fix was to change the way back references for file extents were hashed."
Chris Mason announced version 0.10 of his new Btrfs filesystem, listing the following new features, "explicit back references, online resizing (including shrinking), in place conversion from Ext3 to Btrfs, data=ordered support, mount options to disable data COW and checksumming, and barrier support for sata and IDE drives". He noted that the disk format in v0.10 has changed, and is not compatible with the v0.9 disk format. Regarding back reference support, Chris explained, "the core of this release is explicit back references for all metadata blocks, data extents, and directory items. These are a crucial building block for future features such as online fsck and migration between devices. The back references are verified during deletes, and the extent back references are checked by the existing offline fsck tool." He then detailed the new Ext3 to Btrfs conversion utility:
"The conversion program uses the copy on write nature of Btrfs to preserve the original Ext3 FS, sharing the data blocks between Btrfs and Ext3 metadata. Btrfs metadata is created inside the free space of the Ext3 filesystem, and it is possible to either make the conversion permanent (reclaiming the space used by Ext3) or roll back the conversion to the original Ext3 filesystem."
"I've finally got some numbers to go along with the Btrfs variable blocksize feature. The basic idea is to create a read/write interface to map a range of bytes on the address space, and use it in Btrfs for all metadata operations (file operations have always been extent based)," explained Chris Mason in a recent posting to the Linux Filesystem Development mailing list. He linked to some benchmark results and summarized, "the first round of benchmarking shows that larger block sizes do consume more CPU, especially in metadata intensive workloads, but overall read speeds are much better." Chris then noted, "Dave reported that XFS saw much higher write throughput with large blocksizes, but so far I'm seeing the most benefits during reads." David Chinner replied, "the basic conclusion is that different filesystems will benefit in different ways with large block sizes...." explaining:
"Btrfs linearises writes due to it's COW behaviour and this is trades off read speed. i.e. we take more seeks to read data so we can keep the write speed high. By using large blocks, you're reducing the number of seeks needed to find anything, and hence the read speed will increase. Write speed will be pretty much unchanged because btrfs does linear writes no matter the block size.
"XFS doesn't linearise writes and optimises it's layout for a large number of disks and a low number of seeks on reads - the opposite of btrfs. Hence large block sizes reduce the number of writes XFS needs to write a given set of data+metadata and hence write speed increases much more than the read speed (until you get to large tree traversals)."
Chris Mason announced an early alpha release of his new Btrfs filesystem, "after the last FS summit, I started working on a new filesystem that maintains checksums of all file data and metadata." He listed the following features as "mostly implemented": "extent based file storage (2^64 max file size), space efficient packing of small files, space efficient indexed directories, dynamic inode allocation, writable snapshots, subvolumes (separate internal filesystem roots), checksums on data and metadata (multiple algorithms available), very fast offline filesystem check". He listed the following features as yet to be implemented: "object level mirroring and striping, strong integration with device mapper for multiple device support, online filesystem check, efficient incremental backup and FS mirroring". Regarding the current state of the project, Chris said:
"The current status is a very early alpha state, and the kernel code weighs in at a sparsely commented 10,547 lines. I'm releasing now in hopes of finding people interested in testing, benchmarking, documenting, and contributing to the code. I've gotten this far pretty quickly, and plan on continuing to knock off the features as fast as I can. Hopefully I'll manage a release every few weeks or so. The disk format will probably change in some major way every couple of releases."