When we handle cpu notify DYING, the whole system is stopped except
current CPU, so we can touch any data, and we remove the orphan_cbs_tail
and send the callbacks to the dest CPU directly.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
rcutree.c | 81 ++++++++++++++-----------------------------------------
rcutree.h | 16 ++--------
rcutree_plugin.h | 8 ++---
rcutree_trace.c | 4 +-
4 files changed, 31 insertions(+), 78 deletions(-)
diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index ccdc04c..669d7fe 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -67,9 +67,6 @@ static struct lock_class_key rcu_node_class[NUM_RCU_LVLS];
.gpnum = -300, \
.completed = -300, \
.onofflock = __RAW_SPIN_LOCK_UNLOCKED(&structname.onofflock), \
- .orphan_cbs_list = NULL, \
- .orphan_cbs_tail = &structname.orphan_cbs_list, \
- .orphan_qlen = 0, \
.fqslock = __RAW_SPIN_LOCK_UNLOCKED(&structname.fqslock), \
.n_force_qs = 0, \
.n_force_qs_ngp = 0, \
@@ -984,53 +981,31 @@ rcu_check_quiescent_state(struct rcu_state *rsp, struct rcu_data *rdp)
#ifdef CONFIG_HOTPLUG_CPU
/*
- * Move a dying CPU's RCU callbacks to the ->orphan_cbs_list for the
- * specified flavor of RCU. The callbacks will be adopted by the next
- * _rcu_barrier() invocation or by the CPU_DEAD notifier, whichever
- * comes first. Because this is invoked from the CPU_DYING notifier,
- * irqs are already disabled.
+ * Move a dying CPU's RCU callbacks to online CPU's callback list.
+ * Synchronization is not required because this function executes
+ * in stop_machine() context.
*/
-static void rcu_send_cbs_to_orphanage(struct rcu_state *rsp)
+static void rcu_send_cbs_to_online(struct rcu_state *rsp)
{
int i;
+ /* current DYING CPU is cleared in the cpu_online_mask */
+ int receive_cpu = cpumask_any(cpu_online_mask);
struct rcu_data *rdp = this_cpu_ptr(rsp->rda);
+ struct rcu_data *receive_rdp = per_cpu_ptr(rsp->rda, receive_cpu);
if (rdp->nxtlist == ...Queued along with the documentation/comment patch below, thank you!!! (Of course, please let me know if you see problems with my patch.) One remaining question... You use cpumask_any() to select the destination CPU, which sounds good until you look at its definition. The problem is that cpumask_any() always chooses the lowest-numbered online CPU. So imagine a (say) 64-CPU system and suppose that CPU 0 remains online. Suppose further that the other 63 CPUs each execute some workload that generates lots of RCU callbacks (perhaps creating then removing a large source tree), and periodically go offline and come back online. All of the RCU callbacks from CPUs 1-63 could easily end up getting dumped onto CPU 0's callback lists. It is easy to imagine that CPU 0 might not be able to invoke these callbacks as fast as the other CPUs could generate them. Or am I missing something? If I am not missing something, could you please adjust so that the adopting CPU varies, perhaps by scanning across the CPUs? Feel free to update this patch or to send a separate patch, whichever is easiest for you. ------------------------------------------------------------------------ rcu: update documentation/comments for Lai's adoption patch Lai's RCU-callback immediate-adoption patch changes the RCU tracing output, so update tracing.txt. Also update a few comments to clarify the synchronization design. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> diff --git a/Documentation/RCU/trace.txt b/Documentation/RCU/trace.txt index 7f852db..3bb386d 100644 --- a/Documentation/RCU/trace.txt +++ b/Documentation/RCU/trace.txt @@ -122,7 +122,8 @@ o "ci" is the number of RCU callbacks that have been invoked for been registered in absence of CPU-hotplug activity. o "co" is the number of RCU callbacks that have been orphaned due to - this CPU going offline. + this CPU going offline. These orphaned callbacks have been moved + to an arbitrarily chosen online CPU. o "ca" ...
It happens in the worst case. It may also happen before this patch. Before this patch, the callback move to the receive-CPU who handles the CPU_DEAD event, and this CPU may be always cpu#0 in the worst case, the problem happens. And it's not help if I introduce a choose_receive_cpu_very_smart(), Suppose further that the other 63 CPUs each execute some workload that generates lots of RCU callbacks (perhaps creating then removing a large source tree), and periodically go offline and come back online. In worse case, in some period, there is only cpu#0 online, So all of the RCU callbacks from CPUs 1-63 could easily end up getting dumped onto CPU 0's callback lists. It is easy to imagine that CPU 0 might not be able to invoke these callbacks as fast as the other CPUs could generate them. Another bad case(it may happens without this patch/with this patch /with choose_receive_cpu_very_smart()): Live-Lock, suppose cpu#A and cpu#B periodically go offline and come back online, the callback may be moved from A to B and from B to A periodically, no callback is handled. To fix these problems(it does really very hardly happen), we must force all adopted callbacks are called before next cpu-offline. so we can use work_on_cpu() or rcu_barrier() to do this. To make the code simpler, I will use rcu_barrier(). Thanks. Lai --
Agreed, it -could- happen before in the worst case, but it required very bad luck for the task adopting the callbacks to always be the same. In contrast, cpumask_any() will always pick on the same CPU. This approach is nice, but requires extensive testing -- a start would be a script that randomly onlines and offlines CPUs while rcutorture is running in the background. If you have not already done so, could you please give this an over-the-weekend test on the largest system you have access to? Thanx, Paul --
| Greg KH | Og dreams of kernels |
| Jens Axboe | [PATCH 31/33] Fusion: sg chaining support |
| Arnd Bergmann | Re: finding your own dead "CONFIG_" variables |
| Mark Brown | [PATCH 2/2] Subject: natsemi: Allow users to disable workaround for DspCfg reset |
| Tony Breeds | [LGUEST] Look in object dir for .config |
git: | |
| Brian Downing | Re: Git in a Nutshell guide |
| John Benes | Re: master has some toys |
| Matthias Lederhofer | [PATCH 4/7] introduce GIT_WORK_TREE to specify the work tree |
| Alexander Sulfrian | [RFC/PATCH] RE: git calls SSH_ASKPASS even if DISPLAY is not set |
| Junio C Hamano | Re: Rss produced by git is not valid xml? |
| Linux Kernel Mailing List | iSeries: fix section mismatch in iseries_veth |
| Linux Kernel Mailing List |
