Matthew Dillon continues to make significant progress on his HAMMER clustering filesystem for DragonFly BSD. He labeled the latest release 56c, noting that it, "represents an additional significant improvement in performance, [also including] bug fixes and most of the final media changes." A significant improvement in write performance was obtained by making the filesystem block size automatically increase from 16K to 64K when a file grows to larger than 1 MB. One remaining media change is required to optimize mtime and atime storage, at which point HAMMER will go into testing and bug fixing mode. Matt noted, "HAMMER's performance is extremely good now, and its system cpu overhead has dropped to roughly the same that we get from UFS", adding, "HAMMER is now able to sustain full disk bandwidth for bulk reads and writes. HAMMER continues to have far superior random-write performance, whether the system caches are blown out or not." Discussing future plans for the filesystem, Matt noted, "I could go on and on, there's so much that can be done with this filesytem :-)" Regarding one of these plans, he offered:
"I am not going to promise it, but there is a slight chance I will be able to get mirroring working by the release. I figured out how to do it, finally. Basically the solution is to add another field to the B-Tree's internal elements... the 'most recent' transaction id, and to propogate it up all the way to the root of the tree. The mirroring code can then optimally scan the B-Tree and pick out all records that have changed relative to some transaction id, allowing it to quickly 'pick up' where it left off and construct a record-level mirror over a fully asynchronous link, without any queueing. You can't get much better then that, frankly. "
"Work continues to progress well but I've hit a few snags," noted Matthew Dillon, referring to the ongoing development of his HAMMER filesystem. He began by highlighting a number of problems with the current design, then adding, "everything else now works, and works well, including and most especially the historical access features." He continued:
"I've come to the conclusion that I am going to have to make a fairly radical change to the on-disk structures to solve these problems. On the plus side, these changes will greatly simplify the filesystem topology and greatly reduce its complexity. On the negative side, recovering space will not be instantaneous and will basically require data to be copied from one part of the filesystem to another."
Matt detailed his solution, which included getting rid of the previously described clusters, super-clusters, A-lists, and per-cluster B-Tree's, "instead have just one global B-Tree for the entire filesystem, able to access any record anywhere in the filesystem", adding that the filesystem would be implemented "as one big huge circular FIFO, pretty much laid down linearly on disk, with a B-Tree to locate and access data." He detailed the many improvements, noting that this also makes it possible to provide efficient real-time mirroring. He concluded, "it will probably take a week or two to rewire the topology and another week or two to debug it. Despite the massive rewiring, the new model is much, MUCH simpler then the old, and all the B-Tree code is retained (just extended to operate across the entire filesystem instead of just within a single cluster)."
"HAMMER work is still progressing well, I hope to have most of it working in a degenerate single-cluster (64MB filesystem) case by the end of next week. (cluster == 64MB block of the disk, not cluster as in clustering)," noted Matthew Dillon on the DragonFlyBSD mailing list. He continued, "gluing the per-cluster B-Tree's together for the multi-cluster case is turning out to be more of a headache and will probably take at least 2 weeks to get working. Some fairly sophisticated heuristics will be needed to avoid unnecessary copying between clusters." Matt went on to note that the next DragonFlyBSD release will likely be delayed a month:
"I may decide to move the 2.0 release to mid-January to give myself some more time. This is similar to what we did for 1.8. Also, I think a January release is better then a Christmas release because people get busy with christmas-like things. I want the filesystem to be at least beta quality as of the release and I don't think its possible to get it there by mid-December."
"Speaking of on-disk B-Trees, ReiserFS' biggest problems are all based on its use of flexible B-Trees," suggested a reader on the DragonFlyBSD Kernel mailing list, pointing to the difficulty of detecting a failed node and then of rebuilding the B-Tree. HAMMER filesystem designer and author, Matt Dillon, explained, "if a cluster needs to be recovered, HAMMER will simply throw away the B-Tree and regenerate it from scratch using the cluster's record list. This way all B-Tree I/O operations can be asynchronous and do not have to be flushed on fsync. At the same time HAMMER will remove any records whose creation transaction id's are too large (i.e. not synchronized with the cluster header), and will zero out the delete transaction id for any records whos deletion transaction id's are too large." Matt then acknowledged:
"The real performance issue for HAMMER is going to be B-Tree insertions and rebalancing across clusters. I think most of the issues can be resolved with appropriate heuristics and by a background process to slowly rebalance clusters. This will require a lot of work, though, and only minimal rebalancing will be in [the end-of-the-year] release."