> On Thu, 2010-04-22 at 16:31 -0400, Vivek Goyal wrote:
>> On Thu, Apr 22, 2010 at 09:59:14AM +0200, Corrado Zoccolo wrote:
>> > Hi Miklos,
>> > On Wed, Apr 21, 2010 at 6:05 PM, Miklos Szeredi <mszeredi@suse.cz> wrote:
>> > > Jens, Corrado,
>> > >
>> > > Here's a graph showing the number of issued but not yet completed
>> > > requests versus time for CFQ and NOOP schedulers running the tiobench
>> > > benchmark with 8 threads:
>> > >
>> > >
http://www.kernel.org/pub/linux/kernel/people/mszeredi/blktrace/queue-depth.jpg
>> > >
>> > > It shows pretty clearly the performance problem is because CFQ is not
>> > > issuing enough request to fill the bandwidth.
>> > >
>> > > Is this the correct behavior of CFQ or is this a bug?
>> > This is the expected behavior from CFQ, even if it is not optimal,
>> > since we aren't able to identify multi-splindle disks yet.
>>
>> In the past we were of the opinion that for sequential workload multi spindle
>> disks will not matter much as readahead logic (in OS and possibly in
>> hardware also) will help. For random workload we anyway don't idle on the
>> single cfqq so it is fine. But my tests now seem to be telling a different
>> story.
>>
>> I also have one FC link to one of the HP EVA and I am running increasing
>> number of sequential readers to see if throughput goes up as number of
>> readers go up. The results are with noop and cfq. I do flush OS caches
>> across the runs but I have no control on caching on HP EVA.
>>
>> Kernel=2.6.34-rc5
>> DIR=/mnt/iostestmnt/fio DEV=/dev/mapper/mpathe
>> Workload=bsr iosched=cfq Filesz=2G bs=4K
>> =========================================================================
>> job Set NR ReadBW(KB/s) MaxClat(us) WriteBW(KB/s) MaxClat(us)
>> --- --- -- ------------ ----------- ------------- -----------
>> bsr 1 1 135366 59024 0 0
>> bsr 1 2 124256 126808 0 0
>> bsr 1 4 132921 341436 0 0
>> bsr 1 8 129807 392904 0 0
>> bsr 1 16 129988 773991 0 0
>>
>> Kernel=2.6.34-rc5
>> DIR=/mnt/iostestmnt/fio DEV=/dev/mapper/mpathe
>> Workload=bsr iosched=noop Filesz=2G bs=4K
>> =========================================================================
>> job Set NR ReadBW(KB/s) MaxClat(us) WriteBW(KB/s) MaxClat(us)
>> --- --- -- ------------ ----------- ------------- -----------
>> bsr 1 1 126187 95272 0 0
>> bsr 1 2 185154 72908 0 0
>> bsr 1 4 224622 88037 0 0
>> bsr 1 8 285416 115592 0 0
>> bsr 1 16 348564 156846 0 0
>>
>
> These numbers are very similar to what I got.
>
>> So in case of NOOP, throughput shotup to 348MB/s but CFQ reamains more or
>> less constat, about 130MB/s.
>>
>> So atleast in this case, a single sequential CFQ queue is not keeing the
>> disk busy enough.
>>
>> I am wondering why my testing results were different in the past. May be
>> it was a different piece of hardware and behavior various across hardware?
>
> Probably. I haven't seen this type of behavior on other hardware.
>
>> Anyway, if that's the case, then we probably need to allow IO from
>> multiple sequential readers and keep a watch on throughput. If throughput
>> drops then reduce the number of parallel sequential readers. Not sure how
>> much of code that is but with multiple cfqq going in parallel, ioprio
>> logic will more or less stop working in CFQ (on multi-spindle hardware).