Miklos Szeredi posted a patch to allow files to be accessed as directories, offering the example of accessing the contents of a compressed tarball as you would any other directory. He noted that this is not the only application of the patch, "others might suggest accessing streams, resource forks or extended attributes through such an interface. However this patch only deals with the non-directory case, so directories would be excluded from that interface. But otherwise this patch doesn't limit the uses of the 'file as directory' concept in any way. It just adds the infrastructure to support these whacky beasts." Al Viro took an interest in the patch noting, "I'll look through the patch tonight; it sounds interesting, assuming that we don't run into serious crap with locking and revalidation logics." This was followed by an interesting discussion between Miklos and Al regarding the implementation of the patch.
Miklos went on to explain how the functionality works using mounts with special properties, "if a non-directory object is accessed with a trailing slash, then the filesystem may opt to let the file be accessed as a directory. In this case 'something' (as supplied by the filesystem) is mounted on top of the non-directory object." He then explained the following special properties of these mounts: "If there's no trailing slash after the file name, the mount won't be followed, even if the path resolution would otherwise follow mounts; The mount only stays there while it is referenced by some external object, like a pwd or an open file. When it is no longer referenced, it is automatically unmounted; Unlike 'real' mounts, this won't block unlink(2) or rename(2) on the underlying object."
From: Miklos Szeredi [email blocked] To: linux-kernel, [email blocked] Subject: [RFC PATCH] file as directory Date: Tue, 22 May 2007 20:48:49 +0200 Why do we want this? -------------------- That depends on who you ask. My answer is this: 'foo.tar.gz/foo/bar' or 'foo.tar.gz/contents/foo/bar' or something similar. Others might suggest accessing streams, resource forks or extended attributes through such an interface. However this patch only deals with the non-directory case, so directories would be excluded from that interface. But otherwise this patch doesn't limit the uses of the "file as directory" concept in any way. It just adds the infrastructure to support these whacky beasts. How is it done? --------------- (See this [1] thread for more discussion on the subject) When a non-directory object is accessed without a trailing slash, then path resolution returns the object itself as usual. If a non-directory object is accessed with a trailing slash, then the filesystem may opt to let the file be accessed as a directory. In this case "something" (as supplied by the filesystem) is mounted on top of the non-directory object. This mount will have special properties: - If there's no trailing slash is after the file name, the mount won't be followed, even if the path resolution would otherwise follow mounts. - The mount only stays there while it is referenced by some external object, like a pwd or an open file. When it is no longer referenced, it is automatically unmounted. - Unlike "real" mounts, this won't block unlink(2) or rename(2) on the underlying object. Compatibility with existing systems ----------------------------------- Filesystems which enable "file as directory" semantics, might possibly break existing applications. For example an app could conceivably check if an object is a directory by appending a slash to the name and trying some filesystem operation. This application might be confused by allowing such operations to succeed on non-directory objects. However in practice this sort of behavior seem to be rare. The other question is, how well unmodified applications cope with user-supplied paths which have a slash after the name of a non-directory object. Command line utilities seem to cope very well, since they don't have too much path "sanitization". Bash also seems perfectly capable of dealing with such beasts, with filename completion and everything. More complex apps like emacs and file browsers have more problems, but in some cases they do actually work as expected. Notably if the supplied path has at least one additional component below the non-directory object. So while this doesn't work in emacs etc.: foo.tar.gz/ this usually does: foo.tar.gz/foo It is probably trivial to teach these programs to not be too clever with path names. It should also be possible to make apps be aware and explicitly support files as directories. Implementation details ---------------------- See comments and Documentation/* in the patch. The patch is careful not to touch the fastpaths in the path resolution: - Only check ->enter() if ->lookup() is not defined and there's a trailing slash. This happens very infrequently, since most apps check the file type before trying to enter a directory - Since the "directory on file" mount is removed on leave, most files won't have anything mounted over them. In these cases follow_mount() and friends will be just as fast as before this patch. There's only a negligible slowdown for crossing a mountpoint and a very minor slowdown for accessing files, which currently have a "directory on file" mount over them. How to try it out ----------------- This needs quite a bit of fiddling. First get the files from http://www.kernel.org/pub/linux/kernel/people/mszeredi/file-as-directory/ - Get the CVS version of fuse, patch it with fuse-enter.patch. - Compile avfs-enter.c as instructed at the top of that file - Get the CVS version of AVFS and compile it, you should get a working avfsd daemon. After mounting with "./avfsd /avfs", try ls -l /avfs/usr/src/linux-2.6.21.tar.gz#/ - Patch a kernel with the below patch. This is against 2.6.22-rc1-mm1, but with some effort should apply to other recent kernels. - Reboot and look for "app/pid enter name/" lines in dmesg. Those are when an app is attempting to access a non-directory with a slash, and failing of course, because no filesystem supports this yet. - Mount the avfs filesystem. It is important to mount it on /avfs. - Mount the avfs-enter filesystem somewhere, e.g. /tmp/avfs - Try ls -l /tmp/avfs/usr/src/linux-2.6.21.tar.gz ls -l /tmp/avfs/usr/src/linux-2.6.21.tar.gz/ cd /tmp/avfs/usr/src/linux-2.6.21.tar.gz/ [1] http://article.gmane.org/gmane.comp.file-systems.reiserfs.general/10861 Signed-off-by: Miklos Szeredi [email blocked] --- Documentation/filesystems/Locking | 7 + Documentation/filesystems/vfs.txt | 12 +- fs/dcache.c | 2 fs/namei.c | 121 ++++++++++++++++---- fs/namespace.c | 223 +++++++++++++++++++++++++++++++++----- include/linux/dcache.h | 20 +++ include/linux/fs.h | 2 include/linux/mount.h | 4 include/linux/namei.h | 3 9 files changed, 338 insertions(+), 56 deletions(-)
From: Al Viro [email blocked] Subject: Re: [RFC PATCH] file as directory Date: Tue, 22 May 2007 23:10:46 +0100 On Tue, May 22, 2007 at 08:48:49PM +0200, Miklos Szeredi wrote: > Why do we want this? > -------------------- > > That depends on who you ask. My answer is this: > > 'foo.tar.gz/foo/bar' or > 'foo.tar.gz/contents/foo/bar' > > or something similar. > > Others might suggest accessing streams, resource forks or extended > attributes through such an interface. However this patch only deals > with the non-directory case, so directories would be excluded from > that interface. > > But otherwise this patch doesn't limit the uses of the "file as > directory" concept in any way. It just adds the infrastructure to > support these whacky beasts. > > How is it done? > --------------- > > (See this [1] thread for more discussion on the subject) > > When a non-directory object is accessed without a trailing slash, then > path resolution returns the object itself as usual. > > If a non-directory object is accessed with a trailing slash, then the > filesystem may opt to let the file be accessed as a directory. In > this case "something" (as supplied by the filesystem) is mounted on > top of the non-directory object. > > This mount will have special properties: > > - If there's no trailing slash is after the file name, the mount > won't be followed, even if the path resolution would otherwise > follow mounts. > > - The mount only stays there while it is referenced by some external > object, like a pwd or an open file. When it is no longer > referenced, it is automatically unmounted. > > - Unlike "real" mounts, this won't block unlink(2) or rename(2) on > the underlying object. Interesting... How do you deal with mount propagation and things like mount --move? As for unlink... How do you deal with having that thing mounted, mounting something _under_ it (so that vfsmount would be kept busy) and then unlinking that sucker? I'll look through the patch tonight; it sounds interesting, assuming that we don't run into serious crap with locking and <shudder> revalidation logics.
From: Miklos Szeredi [email blocked] Subject: Re: [RFC PATCH] file as directory Date: Wed, 23 May 2007 08:36:04 +0200 > > When a non-directory object is accessed without a trailing slash, then > > path resolution returns the object itself as usual. > > > > If a non-directory object is accessed with a trailing slash, then the > > filesystem may opt to let the file be accessed as a directory. In > > this case "something" (as supplied by the filesystem) is mounted on > > top of the non-directory object. > > > > This mount will have special properties: > > > > - If there's no trailing slash is after the file name, the mount > > won't be followed, even if the path resolution would otherwise > > follow mounts. > > > > - The mount only stays there while it is referenced by some external > > object, like a pwd or an open file. When it is no longer > > referenced, it is automatically unmounted. > > > > - Unlike "real" mounts, this won't block unlink(2) or rename(2) on > > the underlying object. > > Interesting... How do you deal with mount propagation and things like > mount --move? Moving (or doing other mount operations on) an ancestor shouldn't be a problem. Moving this mount itself is not allowed, and neither is doing bind or pivot_root. Maybe bind could be allowed... When doing recursive bind on ancestor, these mounts are skipped. > As for unlink... How do you deal with having that thing > mounted, mounting something _under_ it (so that vfsmount would be kept > busy) and then unlinking that sucker? Yeah, that's a good point. Current patch doesn't deal with that. Simplest solution could be to disallow submounting these. Don't think it makes much sense anyway. > I'll look through the patch tonight; it sounds interesting, assuming that > we don't run into serious crap with locking and <shudder> revalidation > logics. Revalidation shouln't be a problem. We'll just end up with an unhashed dentry with a mount over it, which will be detached when the vfsmount ref is dropped. Miklos
From: Al Viro [email blocked] Subject: Re: [RFC PATCH] file as directory Date: Wed, 23 May 2007 08:03:06 +0100 On Wed, May 23, 2007 at 08:36:04AM +0200, Miklos Szeredi wrote: > > Interesting... How do you deal with mount propagation and things like > > mount --move? > > Moving (or doing other mount operations on) an ancestor shouldn't be a > problem. Moving this mount itself is not allowed, and neither is > doing bind or pivot_root. Maybe bind could be allowed... Eh... Arbitrary limitations are fun, aren't they? > When doing recursive bind on ancestor, these mounts are skipped. What about clone copying your namespace? What about MNT_SLAVE stuff being set up prior to that lookup? More interesting question: should independent lookups of that sucker on different paths end up with the same superblock (and vfsmount for each) or should we get fully independent mount on each? The latter would be interesting wrt cache coherency... > > As for unlink... How do you deal with having that thing > > mounted, mounting something _under_ it (so that vfsmount would be kept > > busy) and then unlinking that sucker? > > Yeah, that's a good point. Current patch doesn't deal with that. > Simplest solution could be to disallow submounting these. Don't think > it makes much sense anyway. Arbitrary limitations... (and that's where revalidate horrors come in, BTW). BTW^2: what if fs mounted that way will happen to have such node itself? I'm not saying that it's unfeasible or won't lead to interesting things, but it really needs semantics done right...
From: Miklos Szeredi [email blocked] Subject: Re: [RFC PATCH] file as directory Date: Wed, 23 May 2007 09:19:17 +0200 > > > Interesting... How do you deal with mount propagation and things like > > > mount --move? > > > > Moving (or doing other mount operations on) an ancestor shouldn't be a > > problem. Moving this mount itself is not allowed, and neither is > > doing bind or pivot_root. Maybe bind could be allowed... > > Eh... Arbitrary limitations are fun, aren't they? But these mounts _are_ special. There is really no point in moving or pivoting them. > > When doing recursive bind on ancestor, these mounts are skipped. > > What about clone copying your namespace? In that case they are cloned, but only those survive which have refs in the new namespace. > What about MNT_SLAVE stuff being set up prior to that lookup? These mounts are not propagated. Or at least I hope so. Propagation stuff is a bit too complicated for my poor little brain. > More interesting question: should independent lookups of that sucker > on different paths end up with the same superblock (and vfsmount for > each) or should we get fully independent mount on each? The latter > would be interesting wrt cache coherency... I think they should be the same superblock, same dentry. What would be the advantage of doing otherwise? > > > As for unlink... How do you deal with having that thing > > > mounted, mounting something _under_ it (so that vfsmount would be kept > > > busy) and then unlinking that sucker? > > > > Yeah, that's a good point. Current patch doesn't deal with that. > > Simplest solution could be to disallow submounting these. Don't think > > it makes much sense anyway. > > Arbitrary limitations... (and that's where revalidate horrors come in, BTW). > BTW^2: what if fs mounted that way will happen to have such node itself? I think doing this recursively should be allowed. "Releasing last ref cleans up the mess" should work in that case. > I'm not saying that it's unfeasible or won't lead to interesting things, > but it really needs semantics done right... Agreed :) Miklos
From: Al Viro [email blocked] Subject: Re: [RFC PATCH] file as directory Date: Wed, 23 May 2007 08:36:58 +0100 On Wed, May 23, 2007 at 09:19:17AM +0200, Miklos Szeredi wrote: > > Eh... Arbitrary limitations are fun, aren't they? > > But these mounts _are_ special. There is really no point in moving or > pivoting them. pivoting - probably true, moving... why not? > > What about MNT_SLAVE stuff being set up prior to that lookup? > > These mounts are not propagated. Or at least I hope so. Propagation > stuff is a bit too complicated for my poor little brain. Er... These mounts might not be propagated, but what about a bind over another instance of such file in master tree? > I think they should be the same superblock, same dentry. What would > be the advantage of doing otherwise? Then you are going to have interesting time with locking in final mntput(). BTW, what about having several links to the same file? You have i_mutex on the inode, so serialization of those is not a problem, but... > I think doing this recursively should be allowed. "Releasing last ref > cleans up the mess" should work in that case. Releasing the last reference will lead to cascade of umounts in that case... IOW, need to be careful with locking.
From: Miklos Szeredi [email blocked] Subject: Re: [RFC PATCH] file as directory Date: Wed, 23 May 2007 10:05:21 +0200 > > > Eh... Arbitrary limitations are fun, aren't they? > > > > But these mounts _are_ special. There is really no point in moving or > > pivoting them. > > pivoting - probably true, moving... why not? I don't see any use for that. But indeed, it should not be too hard to do. > > > What about MNT_SLAVE stuff being set up prior to that lookup? > > > > These mounts are not propagated. Or at least I hope so. Propagation > > stuff is a bit too complicated for my poor little brain. > > Er... These mounts might not be propagated, but what about a bind > over another instance of such file in master tree? So your question is, which mount takes priority on the lookup? It probably should be the propagated real mount, rather than the dir-on-file one, shouldn't it? > > I think they should be the same superblock, same dentry. What would > > be the advantage of doing otherwise? > > Then you are going to have interesting time with locking in final mntput(). Final mntput of what? > BTW, what about having several links to the same file? You have i_mutex > on the inode, so serialization of those is not a problem, but... Sorry, I lost it... > > I think doing this recursively should be allowed. "Releasing last ref > > cleans up the mess" should work in that case. > > Releasing the last reference will lead to cascade of umounts in that > case... IOW, need to be careful with locking. I think it's done right: detach_mnt() with namespace_sem and vfsmount_lock, then release locks, and path_release(&old_nd). If the recursion is extremely deep we could have stack overflow problems though, aargh... Miklos
From: Al Viro [email blocked] Subject: Re: [RFC PATCH] file as directory Date: Wed, 23 May 2007 09:29:18 +0100 On Wed, May 23, 2007 at 10:05:21AM +0200, Miklos Szeredi wrote: > > Er... These mounts might not be propagated, but what about a bind > > over another instance of such file in master tree? > > So your question is, which mount takes priority on the lookup? It > probably should be the propagated real mount, rather than the > dir-on-file one, shouldn't it? There might be dragons in that area... > > > I think they should be the same superblock, same dentry. What would > > > be the advantage of doing otherwise? > > > > Then you are going to have interesting time with locking in final mntput(). > > Final mntput of what? When the last reference to your mount goes away. > > BTW, what about having several links to the same file? You have i_mutex > > on the inode, so serialization of those is not a problem, but... > > Sorry, I lost it... Say /foo/bar/a is such a file. cd /foo/bar ln a b now do lookups on a/ and b/ What happens?
From: Miklos Szeredi [email blocked] Subject: Re: [RFC PATCH] file as directory Date: Wed, 23 May 2007 11:03:08 +0200 > On Wed, May 23, 2007 at 10:05:21AM +0200, Miklos Szeredi wrote: > > > Er... These mounts might not be propagated, but what about a bind > > > over another instance of such file in master tree? > > > > So your question is, which mount takes priority on the lookup? It > > probably should be the propagated real mount, rather than the > > dir-on-file one, shouldn't it? > > There might be dragons in that area... > > > > > I think they should be the same superblock, same dentry. What would > > > > be the advantage of doing otherwise? > > > > > > Then you are going to have interesting time with locking in final mntput(). > > > > Final mntput of what? > > When the last reference to your mount goes away. I still don't get it where the superblock comes in. The locking is "interesting" in there, yes. And I haven't completely convinced myself it's right, let alone something that won't easily be screwed up in the future. So there's definitely room for thought there. But how does it matter if two different paths have the same sb or a different sb mounted over them? > > > BTW, what about having several links to the same file? You have i_mutex > > > on the inode, so serialization of those is not a problem, but... > > > > Sorry, I lost it... > > Say /foo/bar/a is such a file. > > cd /foo/bar > ln a b > > now do lookups on a/ and b/ > > What happens? The same dentry is mounted over each one. The contents of the directory should only depend on the contents of the underlying inode. The path leading up to it is completely irrelevant. Miklos
From: Al Viro [email blocked] Subject: Re: [RFC PATCH] file as directory Date: Wed, 23 May 2007 10:58:24 +0100 On Wed, May 23, 2007 at 11:03:08AM +0200, Miklos Szeredi wrote: > I still don't get it where the superblock comes in. The locking is > "interesting" in there, yes. And I haven't completely convinced > myself it's right, let alone something that won't easily be screwed up > in the future. So there's definitely room for thought there. > > But how does it matter if two different paths have the same sb or a > different sb mounted over them? Because then you get a slew of fun issues with dropping the final reference to vfsmount vs. lookup on another place. What hold do you have on that superblock and when do you switch from "oh, called ->enter() on the same inode again, return vfsmount over the same superblock" to "need to initialize that damn superblock, all mounts are gone"? > The same dentry is mounted over each one. The contents of the > directory should only depend on the contents of the underlying inode. > The path leading up to it is completely irrelevant. So what kind of exclusion do you have for ->enter()? None?
From: Miklos Szeredi [email blocked] Subject: Re: [RFC PATCH] file as directory Date: Wed, 23 May 2007 12:14:28 +0200 > On Wed, May 23, 2007 at 11:03:08AM +0200, Miklos Szeredi wrote: > > I still don't get it where the superblock comes in. The locking is > > "interesting" in there, yes. And I haven't completely convinced > > myself it's right, let alone something that won't easily be screwed up > > in the future. So there's definitely room for thought there. > > > > But how does it matter if two different paths have the same sb or a > > different sb mounted over them? > > Because then you get a slew of fun issues with dropping the final reference > to vfsmount vs. lookup on another place. What hold do you have on that > superblock and when do you switch from "oh, called ->enter() on the same > inode again, return vfsmount over the same superblock" to "need to > initialize that damn superblock, all mounts are gone"? > > > The same dentry is mounted over each one. The contents of the > > directory should only depend on the contents of the underlying inode. > > The path leading up to it is completely irrelevant. > > So what kind of exclusion do you have for ->enter()? None? > So really these issues, are about how do we get hold of the superblock to mount. I think that should be a filesystem internal problem, and I suspect the easiest solution is to just have a permanent meta superblock for these dir-on-file mounts. Miklos
like for reiserfs4 ?
This ideas was discuss a lot as a feature of reiserfs4. There was a lot of problem like everything that does recursive scanning of the directory, link, etc...
Instead of creating somethink like /home/foo/bar.tgz/bar why not use something like /meta/home/foo/bar.tgz/bar ?
"Old" application will work the same. "Aware" application will use /meta directory.
The /meta thing could also use whatever system even fuse one.
ALBODs?
Didn't this idea come up for Reiser4, like, 3 years ago? And whatever happened to ALBODs ("Application Logical Bundles of Data") from, oh, 1999? And where does FUSE fit in all this?
Similar features in the GNU Hurd and Plan 9
It may be worth adding that the GNU Hurd and Plan 9 have had such features for several years (e.g., the Hurd has a `tarfs' "translator" that allows tarballs to be "mounted" and many other similar things, although mounting does not occur automatically).
The Hurd's flexible VFS framework also allows file system servers ("translators") to implement all kinds of semantics, including files-as-directories, directories-as-files, and what not.
FUSE?
This needs to be implemented as a kernel patch and not as a FUSE filesytem ... why?
RTFA, this is about
RTFA, this is about extending FUSE so that fuse filesystems (like tarfs) can do this. The patchset is written by the FUSE maintainer.