Re: [PATCH -v3] perf, x86: try to handle unknown nmis with running perfctrs

Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
From: Ingo Molnar
Date: Wednesday, August 25, 2010 - 4:00 am

* Ingo Molnar <mingo@elte.hu> wrote:


Here's a more detailed description of the regression introduced by:

  4a31beb: perf, x86: Fix handle_irq return values
  8e3e42b: perf, x86: Try to handle unknown nmis with an enabled PMU

Booting into the debug kernel the system boots up fine - no NMI 
messages, as expected.

Then when i start 'perf top' for the first time i get the NMI message 
with this debug output:

 cpu #15, nmi #160, marked #0, handled = 1, time = 333392635730, delta = 11238255
 cpu #15, nmi #161, marked #0, handled = 1, time = 333403779380, delta = 11143650
 cpu #15, nmi #162, marked #0, handled = 1, time = 333415418497, delta = 11639117
 cpu #15, nmi #163, marked #0, handled = 1, time = 333415467084, delta = 48587
 cpu #15, nmi #164, marked #0, handled = 1, time = 333415501531, delta = 34447
 cpu #15, nmi #165, marked #0, handled = 1, time = 333459918106, delta = 44416575
 cpu #15, nmi #166, marked #0, handled = 0, time = 333459923167, delta = 1666
 cpu #15, nmi #151, marked #0, handled = 1, time = 332978597882, delta = 11447002
 cpu #15, nmi #152, marked #0, handled = 1, time = 332978657151, delta = 59269
 cpu #15, nmi #153, marked #0, handled = 1, time = 332978667847, delta = 10696
 cpu #15, nmi #154, marked #0, handled = 1, time = 333023125757, delta = 44457910
 cpu #15, nmi #155, marked #0, handled = 1, time = 333291980833, delta = 268855076
 cpu #15, nmi #156, marked #0, handled = 1, time = 333325663125, delta = 33682292
 cpu #15, nmi #157, marked #0, handled = 1, time = 333348216481, delta = 22553356
 cpu #15, nmi #158, marked #0, handled = 1, time = 333370168887, delta = 21952406
 cpu #15, nmi #159, marked #0, handled = 1, time = 333381397475, delta = 11228588
 Uhhuh. NMI received for unknown reason 00 on CPU 15.
 Do you have a strange power saving mode enabled?
 Dazed and confused, but trying to continue

When i start perf top for a second time, no messages are printed at all. 
The reason is that on one of the CPUs NMIs are 'stuck':

 NMI: 78164 67099 6342 [*] 65677 66119 63796 65395 63995 65012 64151 65082 
      63483 64948 62926 65608 62630

CPU#2 is stuck at 6342.

The NMIs work fine on other CPUs and perf top works (sans the missing 
samples from CPU#2@@), and the NMIs keep ticking.

The CPU is:

 processor	: 2
 vendor_id	: GenuineIntel
 cpu family	: 6
 model		: 26
 model name	: Intel(R) Xeon(R) CPU           X55600 @ 2.80GHz
 stepping	: 5
 cpu MHz	: 2794.000
 cache size	: 8192 KB
 physical id	: 0
 siblings	: 8
 core id	: 1
 cpu cores	: 4
 apicid		: 2
 initial apicid	: 2
 fpu		: yes
 fpu_exception	: yes
 cpuid level	: 11
 wp		: yes
 flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm ida tpr_shadow vnmi flexpriority ept vpid
 bogomips	: 5599.98
 clflush size	: 64
 cache_alignment: 64
 address sizes	: 40 bits physical, 48 bits virtual
 power management:

The PMU init is:

 Performance Events: PEBS fmt1+, Nehalem events, Intel PMU driver.
 ... version:                3
 ... bit width:              48
 ... generic registers:      4
 ... value mask:             0000ffffffffffff
 ... max period:             000000007fffffff
 ... fixed-purpose events:   3
 ... event mask:             000000070000000f

I've attached the config as well.

Thanks,

	Ingo
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
Re: [PATCH -v3] perf, x86: try to handle unknown nmis with ..., Ingo Molnar, (Wed Aug 25, 4:00 am)
Re: [PATCH -v3] perf, x86: try to handle unknown nmis with ..., Frederic Weisbecker, (Wed Aug 25, 4:52 pm)