Re: [PATCH 2/2 v2] rcu,cleanup: simplify the code when cpu is dying

Previous thread: [PATCH 1/2 v2] rcu,cleanup: move synchronize_sched_expedited() out of sched.c by Lai Jiangshan on Tuesday, October 19, 2010 - 11:12 pm. (6 messages)

Next thread: [PATCH] sched_fair.c:find_busiest_group(), kernel 2.6.35.7 by Andrew Dickinson on Wednesday, October 20, 2010 - 12:20 am. (3 messages)
From: Lai Jiangshan
Date: Tuesday, October 19, 2010 - 11:13 pm

When we handle cpu notify DYING, the whole system is stopped except
current CPU, so we can touch any data, and we remove the orphan_cbs_tail
and send the callbacks to the dest CPU directly.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 rcutree.c        |   81 ++++++++++++++-----------------------------------------
 rcutree.h        |   16 ++--------
 rcutree_plugin.h |    8 ++---
 rcutree_trace.c  |    4 +-
 4 files changed, 31 insertions(+), 78 deletions(-)
diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index ccdc04c..669d7fe 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -67,9 +67,6 @@ static struct lock_class_key rcu_node_class[NUM_RCU_LVLS];
 	.gpnum = -300, \
 	.completed = -300, \
 	.onofflock = __RAW_SPIN_LOCK_UNLOCKED(&structname.onofflock), \
-	.orphan_cbs_list = NULL, \
-	.orphan_cbs_tail = &structname.orphan_cbs_list, \
-	.orphan_qlen = 0, \
 	.fqslock = __RAW_SPIN_LOCK_UNLOCKED(&structname.fqslock), \
 	.n_force_qs = 0, \
 	.n_force_qs_ngp = 0, \
@@ -984,53 +981,31 @@ rcu_check_quiescent_state(struct rcu_state *rsp, struct rcu_data *rdp)
 #ifdef CONFIG_HOTPLUG_CPU
 
 /*
- * Move a dying CPU's RCU callbacks to the ->orphan_cbs_list for the
- * specified flavor of RCU.  The callbacks will be adopted by the next
- * _rcu_barrier() invocation or by the CPU_DEAD notifier, whichever
- * comes first.  Because this is invoked from the CPU_DYING notifier,
- * irqs are already disabled.
+ * Move a dying CPU's RCU callbacks to online CPU's callback list.
+ * Synchronization is not required because this function executes
+ * in stop_machine() context.
  */
-static void rcu_send_cbs_to_orphanage(struct rcu_state *rsp)
+static void rcu_send_cbs_to_online(struct rcu_state *rsp)
 {
 	int i;
+	/* current DYING CPU is cleared in the cpu_online_mask */
+	int receive_cpu = cpumask_any(cpu_online_mask);
 	struct rcu_data *rdp = this_cpu_ptr(rsp->rda);
+	struct rcu_data *receive_rdp = per_cpu_ptr(rsp->rda, receive_cpu);
 
 	if (rdp->nxtlist == ...
From: Paul E. McKenney
Date: Wednesday, October 20, 2010 - 12:25 pm

Queued along with the documentation/comment patch below, thank you!!!
(Of course, please let me know if you see problems with my patch.)

One remaining question...  You use cpumask_any() to select the destination
CPU, which sounds good until you look at its definition.  The problem
is that cpumask_any() always chooses the lowest-numbered online CPU.
So imagine a (say) 64-CPU system and suppose that CPU 0 remains online.
Suppose further that the other 63 CPUs each execute some workload that
generates lots of RCU callbacks (perhaps creating then removing a large
source tree), and periodically go offline and come back online.

All of the RCU callbacks from CPUs 1-63 could easily end up getting
dumped onto CPU 0's callback lists.  It is easy to imagine that CPU 0
might not be able to invoke these callbacks as fast as the other CPUs
could generate them.

Or am I missing something?

If I am not missing something, could you please adjust so that the
adopting CPU varies, perhaps by scanning across the CPUs?  Feel free
to update this patch or to send a separate patch, whichever is easiest
for you.


------------------------------------------------------------------------

rcu: update documentation/comments for Lai's adoption patch

Lai's RCU-callback immediate-adoption patch changes the RCU tracing
output, so update tracing.txt.  Also update a few comments to clarify
the synchronization design.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/Documentation/RCU/trace.txt b/Documentation/RCU/trace.txt
index 7f852db..3bb386d 100644
--- a/Documentation/RCU/trace.txt
+++ b/Documentation/RCU/trace.txt
@@ -122,7 +122,8 @@ o	"ci" is the number of RCU callbacks that have been invoked for
 	been registered in absence of CPU-hotplug activity.
 
 o	"co" is the number of RCU callbacks that have been orphaned due to
-	this CPU going offline.
+	this CPU going offline.  These orphaned callbacks have been moved
+	to an arbitrarily chosen online CPU.
 
 o	"ca" ...
From: Lai Jiangshan
Date: Friday, October 22, 2010 - 12:35 am

It happens in the worst case. It may also happen before this patch.

Before this patch, the callback move to the receive-CPU who handles the CPU_DEAD
event, and this CPU may be always cpu#0 in the worst case, the problem happens.

And it's not help if I introduce a choose_receive_cpu_very_smart(),
Suppose further that the other 63 CPUs each execute some workload that
generates lots of RCU callbacks (perhaps creating then removing a large
source tree), and periodically go offline and come back online. In worse
case, in some period, there is only cpu#0 online, So all of the RCU callbacks
from CPUs 1-63 could easily end up getting dumped onto CPU 0's callback lists. 
It is easy to imagine that CPU 0 might not be able to invoke these callbacks
as fast as the other CPUs could generate them.

Another bad case(it may happens without this patch/with this patch
/with choose_receive_cpu_very_smart()):
	Live-Lock, suppose cpu#A and cpu#B periodically go offline and come
	back online, the callback may be moved from A to B and from B to A
	periodically, no callback is handled.

To fix these problems(it does really very hardly happen), we must force
all adopted callbacks are called before next cpu-offline. so we can use
work_on_cpu() or rcu_barrier() to do this. To make the code simpler, I will
use rcu_barrier().

Thanks.
Lai
--

From: Paul E. McKenney
Date: Friday, October 22, 2010 - 9:10 am

Agreed, it -could- happen before in the worst case, but it required very
bad luck for the task adopting the callbacks to always be the same.
In contrast, cpumask_any() will always pick on the same CPU.


This approach is nice, but requires extensive testing -- a start would
be a script that randomly onlines and offlines CPUs while rcutorture
is running in the background.  If you have not already done so, could
you please give this an over-the-weekend test on the largest system
you have access to?

							Thanx, Paul
--

Previous thread: [PATCH 1/2 v2] rcu,cleanup: move synchronize_sched_expedited() out of sched.c by Lai Jiangshan on Tuesday, October 19, 2010 - 11:12 pm. (6 messages)

Next thread: [PATCH] sched_fair.c:find_busiest_group(), kernel 2.6.35.7 by Andrew Dickinson on Wednesday, October 20, 2010 - 12:20 am. (3 messages)