"I'm pleased to announce [the] 7'th and final release of the distributed storage subsystem (DST)," Evgeniy Polyakov stated, completing the TODO list on the project's web page. He titled the release, "squizzed black-out of the dancing back-aching hippo", noting, "it clearly shows my condition". New features in this release include checksum support, extended auto-configuration for detecting and auto-enabling checksums if supported by the remote host, new sysfs files for marking a given node as clean (in-sync) or dirty (not-in-sync), and numerous bug fixes.
Evgeniy released the first version of his distributed storage subsystem in July of 2007. In September he explained that this was the first step in a larger distributed filesystem project he's planning. In late October, Andrew Morton noted that the work looked ready to be merged into his -mm kernel.
Andrew Morton responded favorably to Evgeniy Polyakov's most recent release of his distributed storage subsystem, "I went back and re-read last month's discussion and I'm not seeing any reason why we shouldn't start thinking about merging this." He then asked, "how close is it to that stage? A peek at your development blog indicates that things are still changing at a moderate rate?" Evgeniy replied:
"I completed storage layer development itself, the only remaining todo item is to implement [a] new redundancy algorithm, but I did not see major demand on that, so it will stay for now with low priority. I will use DST as a transport layer for [a] distributed filesystem, and probably that will require additional features, I have no clean design so far, but right now I have nothing in the pipe to commit to DST."
Evgeniy Polyakov announced a new version of his distributed storage subsystem, "this release includes [a] mirroring algorithm extension, which allows [the subsystem] to store [the] 'age' of the given node on the underlying media." He went on to explain why this was useful:
"In this case, if [a] failed node gets new media, which does not contain [the] correct 'age' (unique id assigned to the whole storage during initialization time), the whole node will be marked as dirty and eventually resynced.
"This allows [it] to have [a] completely transparent failure recovery - [the] failed node can be just turned off, its hardware fixed and then turned on. DST core will detect [the] connection reset and automatically reconnect when [the] node is ready and resync if needed without any special administrator's steps."
"I'm pleased to announce [the] fourth release of the distributed storage subsystem, which allows [you] to form a storage [block device] on top of remote and local nodes, which in turn can be exported to another storage [block device] as a node to form tree-like storage [block devices]," Evgeniy Polyakov stated on the Linux Kernel mailing list. The new release includes a new configuration interface and several bug fixes.
Network device driver and SATA subsystem maintainer, Jeff Garzik, was not impressed with the concept, "[distributed block devices] are not very useful, because it still relies on a useful filesystem sitting on top of the DBS." He went on to explain the problem, "it devolves into one of two cases: (1) multi-path much like today's SCSI, with distributed filesystem arbitrarion to ensure coherency, or (2) the filesystem running on top of the DBS is on a single host, and thus, a single point of failure (SPOF)." He proposed instead that time would be better spent developing a POSIX-only distributed filesystem, "in contrast, a distributed filesystem offers far more scalability, eliminates single points of failure, and offers more room for optimization and redundancy across the cluster." Jeff went on to caution, "a distributed filesystem is also much more complex, which is why distributed block devices are so appealing :)" When Lustre was pointed out as an existing option, Jeff noted, "Lustre is tilted far too much towards high-priced storage, and needs improvement before it could be considered for mainline."
Evgeniy Polyakov, listed as the connector and w1 subsystem maintainer, announced the first release of his distributed storage subsystem, "which allows [you] to form storage on top of remote and local nodes, which in turn can be exported to another storage as a node to form tree-like storages." He describes the features of this new block device: "zero additional allocations in the common fast path not counting network allocations; zero-copy sending if supported by device using sendpage(); ability to use any implemented algorithm (linear algo implemented); pluggable mapping algorithms; failover recovery in case of broken link; ability to suspend remote node for maintenance without breaking dataflow to another nodes (if supported by algorithm and block layer) and without turning down main node; initial autoconfiguration (ability to request remote node size and use that dynamic data during array setup time); non-blocking network data processing; support for any kind of network media (not limited to tcp or inet protocols); no need for any special tools for data processing (like special userspace applications) except for configuration; userspace and kernelspace targets."
In his blog, Evgeniy noted a similarity to the recently discussed DRBD. In the recent announcement he compares his solution to iSCSI and NBD noting the following advantages: "non-blocking processing without busy loops; small, pluggable architecture; failover recovery (reconnect to remote target); autoconfiguration; no additional allocations; very simple; works with different network protocols; and storage can be formed on top of remote nodes and be exported simultaneously".