Currently we iterate the full machine looking for a matching core_id/nb
for the percore and the amd northbridge stuff , using a smaller topology
mask makes sense.
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
---
arch/x86/kernel/cpu/perf_event_amd.c | 2 +-
arch/x86/kernel/cpu/perf_event_intel.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/cpu/perf_event_amd.c b/arch/x86/kernel/cpu/perf_event_amd.c
index 67e2202..5a3b7b8 100644
--- a/arch/x86/kernel/cpu/perf_event_amd.c
+++ b/arch/x86/kernel/cpu/perf_event_amd.c
@@ -323,7 +323,7 @@ static void amd_pmu_cpu_starting(int cpu)
nb_id = amd_get_nb_id(cpu);
WARN_ON_ONCE(nb_id == BAD_APICID);
- for_each_online_cpu(i) {
+ for_each_cpu(i, topology_core_cpumask(cpu)) {
nb = per_cpu(cpu_hw_events, i).amd_nb;
if (WARN_ON_ONCE(!nb))
continue;
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index 354d1de..ad70c2c 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -1111,7 +1111,7 @@ static void intel_pmu_cpu_starting(int cpu)
if (!ht_enabled(cpu))
return;
- for_each_online_cpu(i) {
+ for_each_cpu(i, topology_thread_cpumask(cpu)) {
struct intel_percore *pc = per_cpu(cpu_hw_events, i).per_core;
if (pc && pc->core_id == core_id) {
--
1.7.3
--
Does topology_thread_cpumask() include offline cpus? I tried looking at --
The problem is not only at offline, but also at online between CPUs going online. I don't think the patch is a good idea and it doesn't even have any advantages either since this is a initialization only slow path. -Andi --
I didn't see the problem.
Assume logical cpu 3, 7 are 2 threads in a core, and they are
plug/unpluged as below sequence,
CPU 3 offline, CPU7 offline, CPU3 online, CPU7 online
1. After cpu3 offline
topology_thread_cpumask(3) returns empty
topology_thread_cpumask(7) returns 7
2. After CPU7 offline
topology_thread_cpumask(3) returns empty
topology_thread_cpumask(7) returns empty
3. When CPU3 online, calling intel_pmu_cpu_starting
topology_thread_cpumask(3) returns 3
topology_thread_cpumask(7) returns empty
for_each_cpu(i, topology_thread_cpumask(cpu)) {
struct intel_percore *pc = per_cpu(cpu_hw_events, i).per_core;
if (pc && pc->core_id == core_id) {
kfree(cpuc->per_core);
cpuc->per_core = pc;
break;
}
}
Above "if" statement will not be executed, because pc->core_id was
initialized to -1 in intel_pmu_cpu_prepare.
4. When CPU7 online, calling intel_pmu_cpu_starting
topology_thread_cpumask(3) returns 3, 7
topology_thread_cpumask(7) returns 3, 7
for_each_cpu(i, topology_thread_cpumask(cpu)) {
struct intel_percore *pc = per_cpu(cpu_hw_events, i).per_core;
if (pc && pc->core_id == core_id) {
kfree(cpuc->per_core);
cpuc->per_core = pc;
break;
}
}
cpuc->per_core->core_id = core_id;
cpuc->per_core->refcnt++;
Above "if" statement will be executed and the per_core data allocated
for cpu7 will be freed.
All above is right, or could you explain more about the problem at CPUs
offline and online?
Thanks,
--
No, it does not include offline cpus.
For x86 code, remove_siblinginfo() clears the bits.
take_cpu_down ->
__cpu_disable ->
native_cpu_disable ->
cpu_disable_common ->
remove_siblinginfo
static void remove_siblinginfo(int cpu)
{
int sibling;
struct cpuinfo_x86 *c = &cpu_data(cpu);
for_each_cpu(sibling, cpu_core_mask(cpu)) {
cpumask_clear_cpu(cpu, cpu_core_mask(sibling));
/*/
* last thread sibling in this cpu core going down
*/
if (cpumask_weight(cpu_sibling_mask(cpu)) == 1)
cpu_data(sibling).booted_cores--;
}
for_each_cpu(sibling, cpu_sibling_mask(cpu))
cpumask_clear_cpu(cpu, cpu_sibling_mask(sibling));
cpumask_clear(cpu_sibling_mask(cpu));
cpumask_clear(cpu_core_mask(cpu));
c->phys_proc_id = 0;
c->cpu_core_id = 0;
cpumask_clear_cpu(cpu, cpu_sibling_setup_mask);
}
--
Borislav, is topology_core_cpumask() the right mask for northbridge_id span? I could imagine Magny-Cours would have all 12 cores in the core_cpumask() and have the node_mask() be half that. --
Adding Andreas since this is his code. So, topology_core_cpumask() or cpu_core_mask() both are cpu_core_map which represents the socket mask. I.e., on a multisocket cpu you'll have in it all the cores on one socket. A 12-cores Magny-Cours contains two internal northbridges and this mask will have 12 bits set. AFAICT, you want to iterate over the cores on a single node here (an internal node in the Magny-Cours case) so for this we have the llc_shared_map. See near the top of cache_shared_cpu_map_setup() in <arch/x86/kernel/cpu/intel_cacheinfo.c> for an example. node_mask() is roughly the same but contains correct info only with CONFIG_NUMA on and correct SRAT table on the system. HTH. -- Regards/Gruss, Boris. Advanced Micro Devices GmbH Einsteinring 24, 85609 Dornach General Managers: Alberto Bozzo, Andrew Bowd Registration: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632 --
