Daniel Phillips noted that his new Tux3 versioning filesystem is now operating like a filesystem, "the last burst of checkins has brought Tux3 to the point where it undeniably acts like a filesystem: one can write files, go away, come back later and read those files by name. We can see some of the hoped for attractiveness starting to emerge: Tux3 clearly does scale from the very small to the very big at the same time. We have our Exabyte file with 4K blocksize and we can also create 64 Petabyte files using 256 byte blocks." He went on to discuss some of the remaining features yet to be implemented, including atomic commits, versioning, coalesce on delete, a version of the filesystem written in the kernel, extents, locking, and extended attributes.
Reviewing the above list, Daniel decided he would work next on the coalesce on delete functionality, noting, "without this we can still delete files but we cannot recover file index blocks, only empty them, not so good." He added that at this time he was only going to focus on file truncation, "as soon as file truncation is added to the test mix we will see much more interesting behavior from the bitmap allocator, and we will discover some great ways to generate horrible fragmentation issues. Yummy." Daniel continued to point out that Tux3 is an open source project, and as such is always looking for others to participate, "whoever wants to carve their initials on what is starting to look like a for-real Linux filesystem, now is a great time to take a flyer. The code base is still tiny, builds fast, has lots of interactive feedback and is easy to work on. And you get to put your email address near the beginning of the list, which will naturally write its way into the history of open source. Probably."
"It is about time to take a step back and describe what I have been implementing," began Daniel Phillips, referring to his new Tux3 filesystem. He provided a simple ASCII diagram that detailed the filesystem's hierarchical structure, describing each of the elements. About one he noted, "the volume table is a new addition not central to the goals of Tux3, but a nice feature to have given that it comes nearly for free. One Tux3 volume can have an arbitrary number of separate filesystems tucked inside it, indexed by a simple integer parameter at mount time. People say they like this idea and it imposes no significant complexity, so it goes in." Daniel continued:
"Each volume has a metablock pointing at the forward log chain for the volume, a version table that describes the hierarchical relationship between versions (snapshots), an atime table to take care of that horrid legacy Unix feature, and an inode table containing files and attributes of files. [...] Versioning takes place in three places, versioned pointers in the atime btree, versioned extents in a file data btree and versioned attributes in the inode table. [...] Notice the absence of a journal, the functionality of which is provided by forward log elements that I described in the Hammer thread (and will eventually write a separate post about)."
"The big advantage Hammer has over Tux3 is, it is up and running and released in the Dragonfly distro," began Daniel Phillips, offering a comparison between the two filesystem. He continued, "the biggest disadvantage is, it runs on BSD, not Linux, and it so heavily implements functionality that is provided by the VFS and block layer in Linux that a port would be far from trivial. It will likely happen eventually, but probably in about the same timeframe that we can get Tux3 up and stable." This led into a lengthy and interesting technical discussion between Daniel and HAMMER author Matthew Dillon, comparing the design of the two filesystems.
Matthew reviewed the Tux3 notes and replied, "it sounds like Tux3 is using many similar ideas [as HAMMER]. I think you are on the right track. I will add one big note of caution, drawing from my experience implementing HAMMER, because I think you are going to hit a lot of the same issues. I spent 9 months designing HAMMER and 9 months implementing it. During the course of implementing it I wound up throwing away probably 80% of the original design outright." Daniel noted that he's been working on the Tux3 design for around ten years, "and working seriously on the simplifying elements for the last three years or so, either entirely on paper or in related work like ddsnap and LVM3." Matthew cautioned, "I can tell you've been thinking about Tux for a long time. If I had one worry about your proposed implementation it would be in the area of algorithmic complexity. You have to deal with the in-memory cache, the log, the B-Tree, plus secondary indexing for snapshotted elements and a ton of special cases all over the place. Your general lookup code is going to be very, very complex. My original design for HAMMER was a lot more complex (if you can believe it!) then the end result. A good chunk of what I had to do going from concept to reality was deflate a lot of that complexity." The friendly conversation offers a very detailed look at the design choices made in each of these file systems.
"Since everybody seems to be having fun building new filesystems these days, I thought I should join the party, began Daniel Phillips, announcing the Tux3 versioning filesystem. He continued, "Tux3 is a write anywhere, atomic commit, btree based versioning filesystem. As part of this work, the venerable HTree design used in Ext3 and Lustre is getting a rev to better support NFS and possibly become more efficient." Daniel explained:
"The main purpose of Tux3 is to embody my new ideas on storage data versioning. The secondary goal is to provide a more efficient snapshotting and replication method for the Zumastor NAS project, and a tertiary goal is to be better than ZFS."
In his announcement email, Daniel noted that implementation work is underway, "much of the work consists of cutting and pasting bits of code I have developed over the years, for example, bits of HTree and ddsnap. The immediate goal is to produce a working prototype that cuts a lot of corners, for example block pointers instead of extents, allocation bitmap instead of free extent tree, linear search instead of indexed, and no atomic commit at all. Just enough to prove out the versioning algorithms and develop new user interfaces for version control."