Hi all,
I don't think that the group intended that there be an opendirplus();
rather readdirplus() would simply be called instead of the usual
readdir(). We should clarify that.
Regarding Peter Staubach's comments about no one ever using the
readdirplus() call; well, if people weren't performing this workload in
the first place, we wouldn't *need* this sort of call! This call is
specifically targeted at improving "ls -l" performance on large
directories, and Sage has pointed out quite nicely how that might work.
In our case (PVFS), we would essentially perform three phases of
communication with the file system for a readdirplus that was obtaining
full statistics: first grabbing the directory entries, then obtaining
metadata from servers on all objects in bulk, then gathering file sizes
in bulk. The reduction in control message traffic is enormous, and the
concurrency is much greater than in a readdir()+stat()s workload. We'd
never perform this sort of optimization optimistically, as the cost of
guessing wrong is just too high. We would want to see the call as a
proper VFS operation that we could act upon.
The entire readdirplus() operation wasn't intended to be atomic, and in
fact the returned structure has space for an error associated with the
stat() on a particular entry, to allow for implementations that stat()
subsequently and get an error because the object was removed between
when the entry was read out of the directory and when the stat was
performed. I think this fits well with what Andreas and others are
thinking. We should clarify the description appropriately.
I don't think that we have a readdirpluslite() variation documented yet?
Gary? It would make a lot of sense. Except that it should probably have
a better name...
Regarding Andreas's note that he would prefer the statlite() flags to
mean "valid", that makes good sense to me (and would obviously apply to
the so-far even more hypothetical readdirpluslite()). I don't think
there's a lot of value in returning possibly-inaccurate values?
Thanks everyone,
Rob
Trond Myklebust wrote: