"Speaking of on-disk B-Trees, ReiserFS' biggest problems are all based on its use of flexible B-Trees," suggested a reader on the DragonFlyBSD Kernel mailing list, pointing to the difficulty of detecting a failed node and then of rebuilding the B-Tree. HAMMER filesystem designer and author, Matt Dillon, explained, "if a cluster needs to be recovered, HAMMER will simply throw away the B-Tree and regenerate it from scratch using the cluster's record list. This way all B-Tree I/O operations can be asynchronous and do not have to be flushed on fsync. At the same time HAMMER will remove any records whose creation transaction id's are too large (i.e. not synchronized with the cluster header), and will zero out the delete transaction id for any records whos deletion transaction id's are too large." Matt then acknowledged:
"The real performance issue for HAMMER is going to be B-Tree insertions and rebalancing across clusters. I think most of the issues can be resolved with appropriate heuristics and by a background process to slowly rebalance clusters. This will require a lot of work, though, and only minimal rebalancing will be in [the end-of-the-year] release."
"I've never looked at the Reiser code though the comments I get from friends who use it are on the order of 'extremely reliable but not the fastest filesystem in the world'," Matt Dillon explained when asked to compare his new clustering HAMMER filesystem with ReiserFS, both of which utilize BTrees to organize objects and records. He continued, "I don't expect HAMMER to be slow. A B-Tree typically uses a fairly small radix in the 8-64 range (HAMMER uses 8 for now). A standard indirect block methodology typically uses a much larger radix, such as 512, but is only able to organize information in a very restricted, linear way." He continued to describe numerous plans he has for optimizing performance, "my expectation is that this will lead to a fairly fast filesystem. We will know in about a month :-)"
Among the optimizations planned, Matt explained, "the main thing you want to do is to issue large I/Os which cover multiple B-Tree nodes and then arrange the physical layout of the B-Tree such that a linear I/O will cover the most likely path(s), thus reducing the actual number of physical I/O's needed." He noted, "HAMMER will also be able to issue 100% asynchronous I/Os for all B-Tree operations, because it doesn't need an intact B-Tree for recovery of the filesystem." He went on to describe another potential optimization allowed by the filesystem's design, "HAMMER is designed to allow clusters-by-cluster reoptimization of the storage layout. Anything that isn't optimally layed-out at the time it was created can be re-layed-out at some later time, e.g. with a continuously running background process or a nightly cron job or something of that ilk. This will allow HAMMER to choose to use an expedient layout instead of an optimal one in its critical path and then 'fix' the layout later on to make re-accesses optimal."
The future of Reiser4 was raised on the lkml, with the filesystem's creator, Hans Reiser [interview], awaiting his May 7'th trial [story]. Concerns that the filesystem wasn't being maintained were laid to rest when Andrew Morton [interview] stated, "the namesys engineers continue to maintain reiser4 and I continue to receive patches for it." He further added, "the namesys guys are responsive and play well with others." As to why the filesystem hasn't yet been merged into the 2.6 kernel, Andrew explained, "to get it unstuck we'd need a general push, get people looking at and testing the code, get the vendors to have a serious think about it, etc. We could do that - it'd require that the namesys people (and I) start making threatening noises about merging it, I guess." He then made joking reference to the recent debate regarding the new CPU schedulers [story], "or we could move all the reiser4 code into kernel/sched.c - that seems to get people fired up."
Namesys developer and author of the Reiser4 encryption and compression plugins, Edward Shiskin, offered some updates. Replying to some comments about the need to remove plugins from the Reiser4 code he explained, "the popular opinion that plugins make more sense in the VFS is a great delusion, as plugins are entities related to reiser4 disk layouts." In an earlier thread it had been suggested that the plugins were misnamed and would be better called an internal abstraction layer [story]. Edward went on to note, "currently there are two namesys employees working [on Reiser4] mostly on enthusiasm." He linked to a wiki page listing known issues with the code needing to be fixed before it's likely to be merged into the 2.6 kernel, "the main issues here are xattrs and support for blocksize != pagesize. I think that adding xattrs will take ~1 month of full-time working. Not sure about blocksize support." When it was noted that other filesystems have already been merged without support for either of these features, Edward said that they'd lower their priority and finish up with the other remaining issues left on the old todo list and resume the merge discussion at that time.
Hans Reiser formed Namesys and began the development of Reiserfs ten years ago. The first release of the filesystem, Reiser3, is part of the mainline 2.4 and 2.6 Linux kernels. The more recent Reiser4 is a complete redesign and reimplementation of Reiserfs, aiming to soon be merged into the mainline 2.6 Linux kernel.
In this interview, Hans discusses his background and how he came to create Namesys and Reiserfs. He looks back at Reiser3, describing the advantages it had over other filesystems when it was released and its current state. He then explores the many improvements currently in Reiser4, describing the plugin architecture and its exciting potential for future semantic enhancements.