"Ceph is a distributed network file system designed to provide excellent performance, reliability, and scalability with POSIX semantics. I periodically see frustration on this list with the lack of a scalable GPL distributed file system with sufficiently robust replication and failure recovery to run on commodity hardware, and would like to think that--with a little love--Ceph could fill that gap," announced Sage Weil on the Linux Kernel mailing list. Originally developed as the subject of his PhD thesis, he went on to list the features of the new filesystem, including POSIX semantics, scalability from a few nodes to thousands of nodes, support for petabytes of data, a highly available design with no signle points of failure, n-way replication of data across multiple nodes, automatic data rebalancing as nodes are added and removed, and a Fuse-based client. He noted that a lightweight kernel client is in progress, as is flexible snapshoting, quotas, and improved security. Sage compared Ceph to other similar filesystems:
"In contrast to cluster filesystems like GFS, OCFS2, and GPFS that rely on symmetric access by all clients to shared block devices, Ceph separates data and metadata management into independent server clusters, similar to Lustre. Unlike Lustre, however, metadata and storage nodes run entirely in userspace and require no special kernel support. Storage nodes utilize either a raw block device or large image file to store data objects, or can utilize an existing file system (XFS, etc.) for local object storage (currently with weakened safety semantics). File data is striped across storage nodes in large chunks to distribute workload and facilitate high throughputs. When storage nodes fail, data is re-replicated in a distributed fashion by the storage nodes themselves (with some coordination from a cluster monitor), making the system extremely efficient and scalable."
Ulrich Drepper noted a difference between the Linux connect(2) man page and the POSIX specification. The former states, "connectionless sockets may dissolve the association by connecting to an address with the sa_family member of sockaddr set to AF_UNSPEC." The latter reads, "if address is a null address for the protocol, the socket's peer address shall be reset." Ulrich explained that he preferred the description in the Linux man page, but the Linux kernel seems to actually follow the POSIX specification, "is this functionality which got lost over time? Or is the man page wrong and this never was the case? Is this a worthwhile change?"
Alan Cox noted, "we got it from the 1003.4g draft socket specification if I remember rightly." David Miller suggested, "the whole AF_UNSPEC thing I'm almost certain comes from BSD, which has behaved that way for centuries." Alan concurred, "its entirely plausible that [the 1003.4g draft socket specification] got it from 4BSE." Ulrich concluded, "I guess I'll just go ahead and file a problem report with the spec. Maybe the Unix vendors will test their implementations and provide feedback."