If certain requests are hanging out in the drive's wbcache longer than
others, that increases the probability that OS filesystem-required,
elevator-provided ordering becomes skewed once requests are passed to
drive firmware.
The sad, sucky fact is that NCQ starvation implies FLUSH CACHE is more
important than ever, if filesystems want to get ordering correct.
IDEALLY, according to the SATA protocol spec, we could issue up to 32
NCQ commands to a SATA drive, each marked with the "FUA" bit to force
the command to hit permanent media before returning.
In theory, this NCQ+FUA mode gives the drive maximum ability to optimize
parallel in-progress commands, decoupling command completion and command
issue -- while also giving the OS complete control of ordering by virtue
of emptying the SATA tagged command queue.
In practice, NCQ+FUA flat out did not work on early drives, and
performance was way under what you would expect for parallel write-thru
command execution. I haven't benchmarked NCQ+FUA in a few years; it
might be worth revisiting.
Jeff
--