Re: [BUG] long freezes on thinkpad t60

Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
From: Linus Torvalds
Date: Monday, June 18, 2007 - 9:34 am

On Mon, 18 Jun 2007, Ingo Molnar wrote:

I really think this the the wrong approach, although *testing* it makes 
sense.

I think we need to handle loops that take, release, and then immediately 
re-take differently.

Such loops are _usually_ of the form where they really just release the 
lock for some latency reason, but in this case I think it's actually just 
a bug.

That code does:

	if (unlikely(p->array || task_running(rq, p))) {

to decide if it needs to just unlock and repeat, but then to decide if it 
need to *yield* it only uses *one* of those tests (namely 

	preempted = !task_running(rq, p);
	..
	if (preempted)
		yield();

and I think that's just broken. It basically says:

 - if the task is running, I will busy-loop on getting/releasing the 
   task_rq_lock

and that is the _real_ bug here.

Trying to make the spinlocks do somethign else than what they do is just 
papering over the real bug. The real bug is that anybody who just 
busy-loops getting a lock is wasting resources so much that we should not 
be at all surprised that some multi-core or NUMA situations will get 
starvation.

Blaming some random Core 2 hardware implementation issue that just makes 
it show up is wrong. It's a software bug, plain and simple.

So how about this diff? The diff looks big, but the *code* is actually 
simpler and shorter, I just added tons of comments, which is what blows it 
up.

The new *code* looks like this:

	repeat:
		/* Unlocked, optimistic looping! */
	        rq = task_rq(p);
	        while (task_running(rq, p))
	                cpu_relax();

		/* Get the *real* values */
	        rq = task_rq_lock(p, &flags);
	        running = task_running(rq, p);
	        array = p->array;
	        task_rq_unlock(rq, &flags);

		/* Check them.. */
	        if (unlikely(running)) {
	                cpu_relax();
	                goto repeat;
	        }

	        if (unlikely(array)) {
	                yield();
	                goto repeat;
	        }

and while I haven't tested it, it looks fairly obviously correct, even if 
it's a bit subtle.

Basically, that first "while()" loop is done entirely without any locking 
at all, and so it's possibly "incorrect", but we don't care. Both the 
runqueue used, and the "task_running()" check might be the wrong tests, 
but they won't oops - they just mean that we might get the wrong results 
due to lack of locking.

So then we get the proper (and careful) rq lock, and check the 
running/runnable state _safely_. And if it turns out that our 
quick-and-dirty and unsafe loop was wrong after all, we just go back and 
try it all again.

Safe, simple, efficient. And we don't ever hold the lock for long times, 
and we will never *loop* by taking, releasing and re-taking the lock. 

Hmm? Untested, I know. Maybe I overlooked something. But even the 
generated assembly code looks fine (much better than it looked before!)

		Linus

----
 kernel/sched.c |   69 ++++++++++++++++++++++++++++++++++++++++++++++++-------
 1 files changed, 60 insertions(+), 9 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index 13cdab3..66445e1 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -1159,21 +1159,72 @@ void wait_task_inactive(struct task_struct *p)
 {
 	unsigned long flags;
 	struct rq *rq;
-	int preempted;
+	struct prio_array *array;
+	int running;
 
 repeat:
+	/*
+	 * We do the initial early heuristics without holding
+	 * any task-queue locks at all. We'll only try to get
+	 * the runqueue lock when things look like they will
+	 * work out!
+	 */
+	rq = task_rq(p);
+
+	/*
+	 * If the task is actively running on another CPU
+	 * still, just relax and busy-wait without holding
+	 * any locks.
+	 *
+	 * NOTE! Since we don't hold any locks, it's not
+	 * even sure that "rq" stays as the right runqueue!
+	 * But we don't care, since "task_running()" will
+	 * return false if the runqueue has changed and p
+	 * is actually now running somewhere else!
+	 */
+	while (task_running(rq, p))
+		cpu_relax();
+
+	/*
+	 * Ok, time to look more closely! We need the rq
+	 * lock now, to be *sure*. If we're wrong, we'll
+	 * just go back and repeat.
+	 */
 	rq = task_rq_lock(p, &flags);
-	/* Must be off runqueue entirely, not preempted. */
-	if (unlikely(p->array || task_running(rq, p))) {
-		/* If it's preempted, we yield.  It could be a while. */
-		preempted = !task_running(rq, p);
-		task_rq_unlock(rq, &flags);
+	running = task_running(rq, p);
+	array = p->array;
+	task_rq_unlock(rq, &flags);
+
+	/*
+	 * Was it really running after all now that we
+	 * checked with the proper locks actually held?
+	 *
+	 * Oops. Go back and try again..
+	 */
+	if (unlikely(running)) {
 		cpu_relax();
-		if (preempted)
-			yield();
 		goto repeat;
 	}
-	task_rq_unlock(rq, &flags);
+
+	/*
+	 * It's not enough that it's not actively running,
+	 * it must be off the runqueue _entirely_, and not
+	 * preempted!
+	 *
+	 * So if it wa still runnable (but just not actively
+	 * running right now), it's preempted, and we should
+	 * yield - it could be a while.
+	 */
+	if (unlikely(array)) {
+		yield();
+		goto repeat;
+	}
+
+	/*
+	 * Ahh, all good. It wasn't running, and it wasn't
+	 * runnable, which means that it will never become
+	 * running in the future either. We're all done!
+	 */
 }
 
 /***
-
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
[BUG] long freezes on thinkpad t60, Miklos Szeredi, (Thu May 24, 5:04 am)
Re: [BUG] long freezes on thinkpad t60, Ingo Molnar, (Thu May 24, 5:54 am)
Re: [BUG] long freezes on thinkpad t60, Miklos Szeredi, (Thu May 24, 7:03 am)
Re: [BUG] long freezes on thinkpad t60, Ingo Molnar, (Thu May 24, 7:10 am)
Re: [BUG] long freezes on thinkpad t60, Miklos Szeredi, (Thu May 24, 7:28 am)
Re: [BUG] long freezes on thinkpad t60, Ingo Molnar, (Thu May 24, 7:42 am)
Re: [BUG] long freezes on thinkpad t60, Ingo Molnar, (Thu May 24, 7:44 am)
Re: [BUG] long freezes on thinkpad t60, Miklos Szeredi, (Thu May 24, 10:09 am)
Re: [BUG] long freezes on thinkpad t60, Ingo Molnar, (Thu May 24, 2:01 pm)
Re: [BUG] long freezes on thinkpad t60, Henrique de Moraes H ..., (Thu May 24, 3:08 pm)
Re: [BUG] long freezes on thinkpad t60, Kok, Auke, (Thu May 24, 3:13 pm)
Re: [BUG] long freezes on thinkpad t60, Ingo Molnar, (Thu May 24, 11:58 pm)
Re: [BUG] long freezes on thinkpad t60, Miklos Szeredi, (Fri May 25, 2:51 am)
Re: [BUG] long freezes on thinkpad t60, Miklos Szeredi, (Thu Jun 14, 9:04 am)
Re: [BUG] long freezes on thinkpad t60, Chuck Ebbert, (Fri Jun 15, 2:25 pm)
Re: [BUG] long freezes on thinkpad t60, Ingo Molnar, (Sat Jun 16, 3:37 am)
Re: [BUG] long freezes on thinkpad t60, Miklos Szeredi, (Sun Jun 17, 2:46 pm)
Re: [BUG] long freezes on thinkpad t60, Ingo Molnar, (Sun Jun 17, 11:43 pm)
Re: [BUG] long freezes on thinkpad t60, Miklos Szeredi, (Mon Jun 18, 12:24 am)
Re: [BUG] long freezes on thinkpad t60, Ingo Molnar, (Mon Jun 18, 1:12 am)
Re: [BUG] long freezes on thinkpad t60, Andrew Morton, (Mon Jun 18, 1:20 am)
Re: [BUG] long freezes on thinkpad t60, Miklos Szeredi, (Mon Jun 18, 1:25 am)
Re: [BUG] long freezes on thinkpad t60, Ingo Molnar, (Mon Jun 18, 1:31 am)
Re: [BUG] long freezes on thinkpad t60, Miklos Szeredi, (Mon Jun 18, 1:34 am)
Re: [BUG] long freezes on thinkpad t60, Ingo Molnar, (Mon Jun 18, 2:18 am)
Re: [BUG] long freezes on thinkpad t60, Ingo Molnar, (Mon Jun 18, 2:38 am)
Re: [BUG] long freezes on thinkpad t60, Ingo Molnar, (Mon Jun 18, 2:44 am)
Re: [BUG] long freezes on thinkpad t60, Miklos Szeredi, (Mon Jun 18, 3:18 am)
Re: [BUG] long freezes on thinkpad t60, Ingo Molnar, (Mon Jun 18, 5:36 am)
Re: [BUG] long freezes on thinkpad t60, Miklos Szeredi, (Mon Jun 18, 6:10 am)
Re: [BUG] long freezes on thinkpad t60, Linus Torvalds, (Mon Jun 18, 9:34 am)
Re: [BUG] long freezes on thinkpad t60, Miklos Szeredi, (Mon Jun 18, 10:41 am)
Re: [BUG] long freezes on thinkpad t60, Linus Torvalds, (Mon Jun 18, 10:48 am)
Re: [BUG] long freezes on thinkpad t60, Ingo Molnar, (Mon Jun 18, 11:00 am)
Re: [BUG] long freezes on thinkpad t60, Ingo Molnar, (Mon Jun 18, 11:02 am)
Re: [BUG] long freezes on thinkpad t60, Linus Torvalds, (Mon Jun 18, 11:25 am)
Re: [BUG] long freezes on thinkpad t60, Ravikiran G Thirumalai, (Mon Jun 18, 9:22 pm)
Re: [BUG] long freezes on thinkpad t60, Jarek Poplawski, (Wed Jun 20, 2:36 am)
Re: [BUG] long freezes on thinkpad t60, Linus Torvalds, (Wed Jun 20, 10:34 am)
Re: [BUG] long freezes on thinkpad t60, Ingo Molnar, (Thu Jun 21, 12:30 am)
Re: [BUG] long freezes on thinkpad t60, Jarek Poplawski, (Thu Jun 21, 12:38 am)
Re: [BUG] long freezes on thinkpad t60, Ingo Molnar, (Thu Jun 21, 1:39 am)
Re: [BUG] long freezes on thinkpad t60, Jarek Poplawski, (Thu Jun 21, 4:09 am)
Re: [BUG] long freezes on thinkpad t60, Linus Torvalds, (Thu Jun 21, 8:50 am)
Re: [BUG] long freezes on thinkpad t60, Linus Torvalds, (Thu Jun 21, 9:01 am)
Re: [BUG] long freezes on thinkpad t60, Ingo Molnar, (Thu Jun 21, 9:08 am)
Re: [BUG] long freezes on thinkpad t60, Linus Torvalds, (Thu Jun 21, 9:32 am)
Re: [BUG] long freezes on thinkpad t60, Chuck Ebbert, (Thu Jun 21, 9:44 am)
Re: [BUG] long freezes on thinkpad t60, Linus Torvalds, (Thu Jun 21, 10:31 am)
Re: [BUG] long freezes on thinkpad t60, Eric Dumazet, (Thu Jun 21, 11:29 am)
Re: [BUG] long freezes on thinkpad t60, Linus Torvalds, (Thu Jun 21, 11:44 am)
Re: [BUG] long freezes on thinkpad t60, Linus Torvalds, (Thu Jun 21, 12:35 pm)
Re: [BUG] long freezes on thinkpad t60, Ingo Molnar, (Thu Jun 21, 12:56 pm)
Re: [BUG] long freezes on thinkpad t60, Ingo Molnar, (Thu Jun 21, 1:09 pm)
Re: [BUG] long freezes on thinkpad t60, Linus Torvalds, (Thu Jun 21, 1:10 pm)
Re: [BUG] long freezes on thinkpad t60, Ingo Molnar, (Thu Jun 21, 1:12 pm)
Re: [BUG] long freezes on thinkpad t60, Linus Torvalds, (Thu Jun 21, 1:14 pm)
Re: [BUG] long freezes on thinkpad t60, Ingo Molnar, (Thu Jun 21, 1:16 pm)
Re: [BUG] long freezes on thinkpad t60, Ingo Molnar, (Thu Jun 21, 1:18 pm)
Re: [BUG] long freezes on thinkpad t60, Ingo Molnar, (Thu Jun 21, 1:23 pm)
Re: [BUG] long freezes on thinkpad t60, Ingo Molnar, (Thu Jun 21, 1:30 pm)
Re: [BUG] long freezes on thinkpad t60, Linus Torvalds, (Thu Jun 21, 1:36 pm)
Re: [BUG] long freezes on thinkpad t60, Eric Dumazet, (Thu Jun 21, 1:36 pm)
[patch] spinlock debug: make looping nicer, Ingo Molnar, (Thu Jun 21, 1:42 pm)
Re: [BUG] long freezes on thinkpad t60, Linus Torvalds, (Thu Jun 21, 1:48 pm)
Re: [patch] spinlock debug: make looping nicer, Linus Torvalds, (Thu Jun 21, 1:58 pm)
Re: [BUG] long freezes on thinkpad t60, Ingo Molnar, (Thu Jun 21, 2:06 pm)
Re: [patch] spinlock debug: make looping nicer, Ingo Molnar, (Thu Jun 21, 2:15 pm)
Re: [patch] spinlock debug: make looping nicer, Jarek Poplawski, (Fri Jun 22, 12:00 am)
Re: [BUG] long freezes on thinkpad t60, Ingo Molnar, (Fri Jun 22, 1:17 am)
Re: [BUG] long freezes on thinkpad t60, Jarek Poplawski, (Fri Jun 22, 3:38 am)
Re: [BUG] long freezes on thinkpad t60, Miklos Szeredi, (Sat Jun 23, 3:36 am)
Re: [BUG] long freezes on thinkpad t60, Linus Torvalds, (Sat Jun 23, 9:39 am)
Re: [BUG] long freezes on thinkpad t60, Jarek Poplawski, (Sun Jun 24, 11:45 pm)
Re: [BUG] long freezes on thinkpad t60, Nick Piggin, (Tue Jun 26, 1:42 am)
Re: [BUG] long freezes on thinkpad t60, Jarek Poplawski, (Tue Jun 26, 3:56 am)
Re: [BUG] long freezes on thinkpad t60, Linus Torvalds, (Tue Jun 26, 10:23 am)
Re: [BUG] long freezes on thinkpad t60, Nick Piggin, (Tue Jun 26, 10:23 pm)
Re: [BUG] long freezes on thinkpad t60, Linus Torvalds, (Tue Jun 26, 11:04 pm)
Re: [BUG] long freezes on thinkpad t60, Nick Piggin, (Tue Jun 26, 11:20 pm)
Re: [BUG] long freezes on thinkpad t60, Linus Torvalds, (Wed Jun 27, 12:47 pm)
Re: [BUG] long freezes on thinkpad t60, Ingo Molnar, (Wed Jun 27, 1:10 pm)
Re: [BUG] long freezes on thinkpad t60, Davide Libenzi, (Wed Jun 27, 1:17 pm)
Re: [BUG] long freezes on thinkpad t60, Linus Torvalds, (Wed Jun 27, 3:11 pm)
Re: [BUG] long freezes on thinkpad t60, Davide Libenzi, (Wed Jun 27, 4:30 pm)
Re: [BUG] long freezes on thinkpad t60, Linus Torvalds, (Wed Jun 27, 5:46 pm)
Re: [BUG] long freezes on thinkpad t60, Davide Libenzi, (Wed Jun 27, 8:03 pm)
Re: [BUG] long freezes on thinkpad t60, Nick Piggin, (Mon Jul 2, 12:06 am)