When we played with this in ext3/4, it was important to not wait when doing
single threaded fsync's (a pretty common case) since that would just make them
slower.
Also, the wait time for multi-threaded fsync's should be capped at some fraction
of the time to complete a flush. For example, we had ATA_CACHE_FLUSH_EXT
commands that took say 16ms or so to flush and waited one jiffie (4ms) and that
worked well. It tanked when we used that fixed waiting time for a high speed
device that could execute a flush in say 1ms (meaning we waited 4 times as long
as it would have taken to just submit the fsync().
I am still not clear that the scheme that you and Neil are proposing would
really batch up enough flushes to help though since you effectively do not wait.
The workload that we used years back was single threaded fs_mark (small files),
2 threads, 4 threads, 8 threads, 16 threads.
Single threaded should show no slow down with various schemes showing
multi-threaded writes grow with the number threads to some point....
Ric
--