Caution and Latency

Submitted by Jeremy
on October 22, 2007 - 5:28am

"With latencytop, I noticed that the (in memory) atime updates during a kernel build had latencies of 600 msec or longer; this is obviously not so nice behavior. Other EXT3 journal related operations had similar or even longer latencies," Arjan van de Ven reported, describing a "mass priority inversion" caused by, "an interaction between EXT3 and CFQ in that CFQ tries to be fair to everyone, including kjournald. However, in reality, kjournald is 'special' in that it does a lot of journal work". Finally, he offered a tiny patch to resolve the issue, "the patch below makes kjournald of the IOPRIO_CLASS_RT priority to break this priority inversion behavior. With this patch, the latencies for atime updates (and similar operation) go down by a factor of 3x to 4x !"

Andrew Morton took a cautious stance, "seems a pretty fundamental change which could do with some careful benchmarking, methinks. See, your patch amounts to 'do more seeks to improve one test case'. Surely other testcases will worsen. What are they?" CFQ author Jens Axboe agreed, "It should not be merged as-is, instead I'll provide a function to do this." Ingo Molnar wasn't convinced, "atime update latencies went down by a factor of 3x-4x ... but what bothers me even more is the large picture. Linux's development is still fundamentally skewed towards bandwidth (which goes up with hardware advances anyway), while the focus on latencies is very lacking (which users do care about much more and which usually does _not_ improve with improved hardware), so i cannot see why we shouldnt apply this." He added, "if bandwidth hurts anywhere, it will be pointed out and fixed, we've got like tons of bandwidth benchmarks and it's _easy_ to fix bandwidth problems. But _finally_ we now have desktop latency tools, hard numbers and patches that fix them, but what do we do ... we put up extra roadblocks??" Andrew calmy replied, "I think the situation is that we've asked for some additional what-can-be-hurt-by-this testing. Yes, we could sling it out there and wait for the reports. But often that's a pretty painful process and regressions can be discovered too late for us to do anything about them."


From: Arjan van de Ven
Subject: [patch] Give kjournald a IOPRIO_CLASS_RT io priority
Date: Oct 15, 10:46 am 2007

Subject: Give kjournald a IOPRIO_CLASS_RT io priority
From: Arjan van de Ven <arjan@linux.intel.com>

With latencytop, I noticed that the (in memory) atime updates during a
kernel build had latencies of 600 msec or longer; this is obviously not so
nice behavior. Other EXT3 journal related operations had similar or even
longer latencies.

Digging into this a bit more, it appears to be an interaction between EXT3
and CFQ in that CFQ tries to be fair to everyone, including kjournald.
However, in reality, kjournald is "special" in that it does a lot of journal
work and effectively this leads to a twisted kind of "mass priority
inversion" type of behavior.

The good news is that CFQ already has the infrastructure to make certain
processes special... JBD just wasn't using that quite yet.

The patch below makes kjournald of the IOPRIO_CLASS_RT priority to break
this priority inversion behavior. With this patch, the latencies for atime
updates (and similar operation) go down by a factor of 3x to 4x !


Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>


diff -purN linux-2.6.23-rc9.org/fs/jbd/journal.c linux-2.6.23-rc9.lt/fs/jbd/journal.c
--- linux-2.6.23-rc9.org/fs/jbd/journal.c	2007-10-02 05:24:52.000000000 +0200
+++ linux-2.6.23-rc9.lt/fs/jbd/journal.c	2007-10-14 00:06:55.000000000 +0200
@@ -35,6 +35,7 @@
 #include <linux/kthread.h>
 #include <linux/poison.h>
 #include <linux/proc_fs.h>
+#include <linux/ioprio.h>
 
 #include <asm/uaccess.h>
 #include <asm/page.h>
@@ -131,6 +132,8 @@ static int kjournald(void *arg)
 	printk(KERN_INFO "kjournald starting.  Commit interval %ld seconds\n",
 			journal->j_commit_interval / HZ);
 
+	current->ioprio =  (IOPRIO_CLASS_RT << IOPRIO_CLASS_SHIFT) | 4;
+
 	/*
 	 * And now, wait forever for commit wakeup events.
 	 */
-

From: Andrew Morton Subject: Re: [patch] Give kjournald a IOPRIO_CLASS_RT io priority Date: Oct 15, 11:47 am 2007 On Mon, 15 Oct 2007 10:46:47 -0700 Arjan van de Ven <arjan@infradead.org> wrote: > > Subject: Give kjournald a IOPRIO_CLASS_RT io priority > From: Arjan van de Ven <arjan@linux.intel.com> > > With latencytop, I noticed that the (in memory) atime updates during a > kernel build had latencies of 600 msec or longer; this is obviously not so > nice behavior. Other EXT3 journal related operations had similar or even > longer latencies. > > Digging into this a bit more, it appears to be an interaction between EXT3 > and CFQ in that CFQ tries to be fair to everyone, including kjournald. > However, in reality, kjournald is "special" in that it does a lot of journal > work and effectively this leads to a twisted kind of "mass priority > inversion" type of behavior. > > The good news is that CFQ already has the infrastructure to make certain > processes special... JBD just wasn't using that quite yet. > > The patch below makes kjournald of the IOPRIO_CLASS_RT priority to break > this priority inversion behavior. With this patch, the latencies for atime > updates (and similar operation) go down by a factor of 3x to 4x ! > Seems a pretty fundamental change which could do with some careful benchmarking, methinks. See, your patch amounts to "do more seeks to improve one test case". Surely other testcases will worsen. What are they? > Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> > > > diff -purN linux-2.6.23-rc9.org/fs/jbd/journal.c linux-2.6.23-rc9.lt/fs/jbd/journal.c > --- linux-2.6.23-rc9.org/fs/jbd/journal.c 2007-10-02 05:24:52.000000000 +0200 > +++ linux-2.6.23-rc9.lt/fs/jbd/journal.c 2007-10-14 00:06:55.000000000 +0200 > @@ -35,6 +35,7 @@ > #include <linux/kthread.h> > #include <linux/poison.h> > #include <linux/proc_fs.h> > +#include <linux/ioprio.h> > > #include <asm/uaccess.h> > #include <asm/page.h> > @@ -131,6 +132,8 @@ static int kjournald(void *arg) > printk(KERN_INFO "kjournald starting. Commit interval %ld seconds\n", > journal->j_commit_interval / HZ); > > + current->ioprio = (IOPRIO_CLASS_RT << IOPRIO_CLASS_SHIFT) | 4; > + Might be worth a code comment? -
From: Jens Axboe Subject: Re: [patch] Give kjournald a IOPRIO_CLASS_RT io priority Date: Oct 15, 12:28 pm 2007 On Mon, Oct 15 2007, Andrew Morton wrote: > On Mon, 15 Oct 2007 10:46:47 -0700 > Arjan van de Ven <arjan@infradead.org> wrote: > > > > > Subject: Give kjournald a IOPRIO_CLASS_RT io priority > > From: Arjan van de Ven <arjan@linux.intel.com> > > > > With latencytop, I noticed that the (in memory) atime updates during a > > kernel build had latencies of 600 msec or longer; this is obviously not so > > nice behavior. Other EXT3 journal related operations had similar or even > > longer latencies. > > > > Digging into this a bit more, it appears to be an interaction between EXT3 > > and CFQ in that CFQ tries to be fair to everyone, including kjournald. > > However, in reality, kjournald is "special" in that it does a lot of journal > > work and effectively this leads to a twisted kind of "mass priority > > inversion" type of behavior. > > > > The good news is that CFQ already has the infrastructure to make certain > > processes special... JBD just wasn't using that quite yet. > > > > The patch below makes kjournald of the IOPRIO_CLASS_RT priority to break > > this priority inversion behavior. With this patch, the latencies for atime > > updates (and similar operation) go down by a factor of 3x to 4x ! > > > > Seems a pretty fundamental change which could do with some careful > benchmarking, methinks. > > See, your patch amounts to "do more seeks to improve one test case". > Surely other testcases will worsen. What are they? Yes, completely agree! I think Arjans patch makes a heap of sense, but some numbers would be great to see. > > Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> > > > > > > diff -purN linux-2.6.23-rc9.org/fs/jbd/journal.c linux-2.6.23-rc9.lt/fs/jbd/journal.c > > --- linux-2.6.23-rc9.org/fs/jbd/journal.c 2007-10-02 05:24:52.000000000 +0200 > > +++ linux-2.6.23-rc9.lt/fs/jbd/journal.c 2007-10-14 00:06:55.000000000 +0200 > > @@ -35,6 +35,7 @@ > > #include <linux/kthread.h> > > #include <linux/poison.h> > > #include <linux/proc_fs.h> > > +#include <linux/ioprio.h> > > > > #include <asm/uaccess.h> > > #include <asm/page.h> > > @@ -131,6 +132,8 @@ static int kjournald(void *arg) > > printk(KERN_INFO "kjournald starting. Commit interval %ld seconds\n", > > journal->j_commit_interval / HZ); > > > > + current->ioprio = (IOPRIO_CLASS_RT << IOPRIO_CLASS_SHIFT) | 4; > > + > > Might be worth a code comment? It should not be merged as-is, instead I'll provide a function to do this. It should also set current->io_context->ioprio_changed. Since no IO has been done yet at this point it doesn't matter. But we should cut a piece of set_task_ioprio() out and provide that as a kernel helper for this sort of thing. Even just writing it as: current->ioprio = (IOPRIO_CLASS_RT << IOPRIO_CLASS_SHIFT) | IOPRIO_NORM; would be more readable. Or perhaps this would suffice, given the above restriction that IO hasn't been done yet: current->ioprio = IOPRIO_PRIO_VALUE(IOPRIO_CLASS_RT, IOPRIO_NORM); -- Jens Axboe -
From: Ingo Molnar Subject: Re: [patch] Give kjournald a IOPRIO_CLASS_RT io priority Date: Oct 22, 2:10 am 2007 * Jens Axboe <jens.axboe@oracle.com> wrote: > > Seems a pretty fundamental change which could do with some careful > > benchmarking, methinks. > > > > See, your patch amounts to "do more seeks to improve one test case". > > Surely other testcases will worsen. What are they? > > Yes, completely agree! I think Arjans patch makes a heap of sense, but > some numbers would be great to see. Arjan gave the relevant hard numbers: | With latencytop, I noticed that the (in memory) atime updates during a | kernel build had latencies of 600 msec or longer [...] | | With this patch, the latencies for atime updates (and similar | operation) go down by a factor of 3x to 4x ! atime update latencies went down by a factor of 3x-4x ... but what bothers me even more is the large picture. Linux's development is still fundamentally skewed towards bandwidth (which goes up with hardware advances anyway), while the focus on latencies is very lacking (which users do care about much more and which usually does _not_ improve with improved hardware), so i cannot see why we shouldnt apply this. Reminds me of the illogical, almost superstitious resistence against the relatime patch. (which is not in 2.6.24 mind you - killed for good) if bandwidth hurts anywhere, it will be pointed out and fixed, we've got like tons of bandwidth benchmarks and it's _easy_ to fix bandwidth problems. But _finally_ we now have desktop latency tools, hard numbers and patches that fix them, but what do we do ... we put up extra roadblocks?? so lets just goddamn apply this _trivial_ patch. This isnt an intrusive 1000 line rewrite that is hard to revert. If it causes any bandwidth problems, it will be just as trivial to undo. If we do anything else we just stiffle the still young and very much under-represented "lets fix latencies that bothers people" movement. If anything we need _positive_ discrimination for latency related fixes (which treatment this fix does not need at all - all it needs is _equal_ footing with the countless bandwidth patches that go into the kernel all the time), otherwise it will never take off and become as healthy as bandwidth optimizations. Ok? Ingo -
From: Andrew Morton Subject: Re: [patch] Give kjournald a IOPRIO_CLASS_RT io priority Date: Oct 22, 2:23 am 2007 On Mon, 22 Oct 2007 11:10:57 +0200 Ingo Molnar <mingo@elte.hu> wrote: > > * Jens Axboe <jens.axboe@oracle.com> wrote: > > > > Seems a pretty fundamental change which could do with some careful > > > benchmarking, methinks. > > > > > > See, your patch amounts to "do more seeks to improve one test case". > > > Surely other testcases will worsen. What are they? > > > > Yes, completely agree! I think Arjans patch makes a heap of sense, but > > some numbers would be great to see. > > Arjan gave the relevant hard numbers: > > | With latencytop, I noticed that the (in memory) atime updates during a > | kernel build had latencies of 600 msec or longer [...] > | > | With this patch, the latencies for atime updates (and similar > | operation) go down by a factor of 3x to 4x ! > > atime update latencies went down by a factor of 3x-4x ... > > but what bothers me even more is the large picture. Linux's development > is still fundamentally skewed towards bandwidth (which goes up with > hardware advances anyway), while the focus on latencies is very lacking > (which users do care about much more and which usually does _not_ > improve with improved hardware), so i cannot see why we shouldnt apply > this. Reminds me of the illogical, almost superstitious resistence > against the relatime patch. (which is not in 2.6.24 mind you - killed > for good) Try `mount -o relatime' and prepare to be surprised ;) > if bandwidth hurts anywhere, it will be pointed out and fixed, we've got > like tons of bandwidth benchmarks and it's _easy_ to fix bandwidth > problems. But _finally_ we now have desktop latency tools, hard numbers > and patches that fix them, but what do we do ... we put up extra > roadblocks?? > > so lets just goddamn apply this _trivial_ patch. This isnt an intrusive > 1000 line rewrite that is hard to revert. If it causes any bandwidth > problems, it will be just as trivial to undo. If we do anything else we > just stiffle the still young and very much under-represented "lets fix > latencies that bothers people" movement. If anything we need _positive_ > discrimination for latency related fixes (which treatment this fix does > not need at all - all it needs is _equal_ footing with the countless > bandwidth patches that go into the kernel all the time), otherwise it > will never take off and become as healthy as bandwidth optimizations. > Ok? > I think the situation is that we've asked for some additional what-can-be-hurt-by-this testing. Yes, we could sling it out there and wait for the reports. But often that's a pretty painful process and regressions can be discovered too late for us to do anything about them. -
From: Ingo Molnar Subject: Re: [patch] Give kjournald a IOPRIO_CLASS_RT io priority Date: Oct 22, 2:27 am 2007 * Andrew Morton <akpm@linux-foundation.org> wrote: > > but what bothers me even more is the large picture. Linux's > > development is still fundamentally skewed towards bandwidth (which > > goes up with hardware advances anyway), while the focus on latencies > > is very lacking (which users do care about much more and which > > usually does _not_ improve with improved hardware), so i cannot see > > why we shouldnt apply this. Reminds me of the illogical, almost > > superstitious resistence against the relatime patch. (which is not > > in 2.6.24 mind you - killed for good) > > Try `mount -o relatime' and prepare to be surprised ;) i mean this one: http://people.redhat.com/mingo/relatime-patches/improve-relatime.patch this actually makes relatime practical. Ingo -
From: Ingo Molnar Subject: Re: [patch] Give kjournald a IOPRIO_CLASS_RT io priority Date: Oct 22, 2:40 am 2007 * Andrew Morton <akpm@linux-foundation.org> wrote: > > so lets just goddamn apply this _trivial_ patch. This isnt an > > intrusive 1000 line rewrite that is hard to revert. If it causes any > > bandwidth problems, it will be just as trivial to undo. If we do > > anything else we just stiffle the still young and very much > > under-represented "lets fix latencies that bothers people" movement. > > If anything we need _positive_ discrimination for latency related > > fixes (which treatment this fix does not need at all - all it needs > > is _equal_ footing with the countless bandwidth patches that go into > > the kernel all the time), otherwise it will never take off and > > become as healthy as bandwidth optimizations. Ok? > > I think the situation is that we've asked for some additional > what-can-be-hurt-by-this testing. > > Yes, we could sling it out there and wait for the reports. But often > that's a pretty painful process and regressions can be discovered too > late for us to do anything about them. reverting this oneliner is trivial. Finding bandwidth problems and tracking them down to this oneliner change is relatively easy too. Finding latency problems and fixing them is _not_ trivial. Boot up a Linux desktop and start OOo or firefox, and measure the time it takes to start the app up. 10-20 seconds on a top-of-the-line quad-core 3.2 GHz system - which is a shame. Same box can do in excess of 1GB/sec block IO. Yes, one could blame the apps but in reality most of the blame is mostly on the kernel side. We do not make bloat and latency suckage apparent enough to user-space (due to lack of intelligent instrumentation), we make latencies hard to fix, we have an acceptance bias towards bandwidth fixes (because they are easier to measure and justify) - and that's all what it takes to let such a situation get out of control. and i can bring up the scheduler as an example. CFS broke the bandwidth performance of one particular app and it took only a few days to get it back under control. But it was months to get good latency behavior out of the scheduler. And that is with the help of excellent scheduler instrumentation. In the IO space the latency situation is much, much worse. Really. Ingo -
From: Andrew Morton Subject: Re: [patch] Give kjournald a IOPRIO_CLASS_RT io priority Date: Oct 22, 2:49 am 2007 None of which is an argument for simply not bothering to do a bit more developer testing before merging.

Where is latencytop?

Martin Ebourne (not verified)
on
October 23, 2007 - 4:31am

Can't find latencytop with google. Where is it?

It's probably a homegrown

intgr
on
October 23, 2007 - 7:02am

It's probably a homegrown tool by Ingo Molnar. I am confident that he plans to release it sooner or later; but for now, it seems that it's not public yet.

According to a LWN comment of his, he wants to make tracking latency problems as easy as Intel did with PowerTop. (PowerTop is considered a great success of kernel instrumentation)

Waiting for the "official"

Paolo (not verified)
on
October 24, 2007 - 1:20am

Waiting for the "official" latencytop you can always use a simple script like this:

#!/bin/bash

function Field()
{
	if [ ! -f "$2" ]; then
		return 1
	fi
	grep -m1 -e "$1" "$2" | cut -d: -f2- | tr -d '[[:space:]]'
}

function print_stat()
{
	for task in /proc/*/task/*; do
		name=$(Field "^Name" $task/status)
		pid=$(basename $task)
		value=$(Field "^$1" $task/sched)
		if [ -z "$pid" ] || [ -z "$value" ]; then
			continue
		fi
		echo -e "$value\t$pid\t$name"
	done
}

echo "### BLOCK MAX ###"
print_stat "se.block_max" | sort -k1nr | head -n10

echo
echo "### WAIT MAX ###"
print_stat "se.wait_max" | sort -k1nr | head -n10

atime updates should be left

muwlgr
on
October 23, 2007 - 12:27pm

atime updates should be left aside from the main filesystem traffic. Kept in separate queue or so. And not cause write-outs as often as other inode modifications cause it. 'relatime' is a step in the research, good but not final.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.