Don, Found your patch on LKML (I am not on it). I am assuming you're talking about back-to-back NMIs here, not nested NMIs. I don't quite understand the scenario here. Is it the case that you handled 1 overflow, and then right as you return from the interrupt, you get a second PMI with a ovfl_status=0 ? What events did you measure? Which counters did you use? --
It is related to this thread: http://lkml.org/lkml/2010/8/25/124 Not acking the status immediately triggered an nmi, but the status was 0. Acking after reading and before processing the counters results in a non-zero status and thus, no empty nmi. -- Advanced Micro Devices, Inc. Operating System Research Center --
Robert, Do you have the test program you used to test this? I believe the NHM hack does not solve the problem, it just makes it harder to appear. I suspect the real issue is that the GLOBAL_STATUS bitmask cannot be trusted. I'd like to verify this. Has the problem appear only on Nehalem or also on Westmere? --
For testing back-to-back nmis I have used:
perf record -e cycles -e instructions -e cache-references
-e cache-misses -e branch-misses -a -- sleep 10
with load on all cpus. But I couldn't reproduce this particular
problem as I do not have such a system available. I think it might
trigger also with only one counter running. What the observed from the
So yes, it looks like it is a cpu bug with a race then clearing the
I don't know.
-Robert
--
Advanced Micro Devices, Inc.
Operating System Research Center
--
I was able to duplicate on Intel(R) Core(TM) i5 CPU 650 @ 3.20GHz Intel(R) Xeon(R) CPU X5560 @ 2.80GHz with just running 'perf top' for about 60 seconds. You would need the first three patches to expose the problem. Reading the code, it seemed like the perf counters should be disabled and this patch should be unecessary, but after playing around with the code for a few hours, I came up with this patch to trap the issue. I read through the cpu errata and could not find anything related but I might have missed something. I am willing to help test if you have a more targeted patch. Cheers, Don --
Don, I managed to reproduce on core i7 860 (without patch4). Looking at the code again, I am dubious you ever execute the retry goto. If the PMU is disabled and you've just cleared the OVF_STAT, then I don't see where the new overflows would come from. But that's a separate problem. One thing I did is to compare status obtained via OVFL_STATUS with one that I build manually by inspecting each individual counter. The two returned bitmasks should always be identical (with PEBS disabled). When I got the spurious NMI, it did not trip my status validation. So the OVFL_STATUS is valid. I found something else that looked fishy. I am experimenting Thanks. --
I agree with you, but a printk before the goto proved otherwise! :-) And a printk of the status bit that triggered the goto happened to be the same one that we initially cleared. Like I said when I initially posted the patch, I am not sure why it works but it does do something to stem the NMI. There is probably a deeper problem here, I was just trying to get the external/unknown nmis working again. Cheers, Don --
Well, I was wrong about my status validation. I had my test wrong. I am seeing discrepancies between OVF_STATUS (status) and manual inspection of each counter (status2) compared to active_mask (act): [ 489.338564] CPU2 status=0x6 status2=0x4 act=0x7 [ 501.484235] CPU2 status=0x6 status2=0x4 act=0x7 [ 541.686414] CPU2 status=0x6 status2=0x4 act=0x7 [ 561.819575] CPU2 status=0x6 status2=0x4 act=0x7 Note that I have the NMI watchdog on, and running perf record with 2 events only. OVF_STATUS reports more than what has actually occurred. I think that is fine because in the loop there is a check for against active_mask. Nevertheless, I think there is a problem right there. I am trying to narrow down when it happens. It seems to occur only when I start a new perf record session. And possibly when perf runs on a CPU that is not monitored: taskset -c 4 perf record -e cycles -e instructions -F 2000 -a -C 2 Yes, I suspect the clearing does not work well. Let's try to do it multiple times I suspect that if we were not to use OVF_STATUS, then we would not need the back-to-back nmi logic. NMi cannot be nested. So just looking at the counters --
One thing we still need to do is on init detect if the BIOS is using one of the PMCs and simply disable all of perf and print a nice big message to the user to request a new BIOS from their vendor. --
Given then way perf_events operate, that is your only choice at this point. But I am sure neither my system nor yours is subject to this particular issue yet there is some unexplained errors with OVF_STATUS. Here is an example of what I gathered on a Westmere: This is coming into the interrupt handler: - status = overflow status coming from GLOBAL_OVF_STATUS - status2 = inspection of the counters - act = cpuc->active_mask[0] In case both status don't match, I dump the state of the active events incl. the counter values(val). [ 822.813808] CPU2 irqin status=0x6 status2=0x4 act=0x7 [ 822.813818] CPU2 cfg=0x13003c idx=0 sel=53003c val=ffffa833f298 [ 822.813821] CPU2 cfg=0x12003c idx=1 sel=52003c val=fffffe130229 [ 822.813823] CPU2 cfg=0x11003c idx=2 sel=51003c val=5e9 Here only counter2 has overflowed, yet the handler will also process counter1 which is wrong. The other thing I noticed is that in intel_pmu_disable_event(), the event stopped sometimes has overflowed. Looks like OVF_STAUS is stale. Maybe OVF_STATUS is not cleared properly somewhere, possibly when an event gets disabled. I have a busy system, with the NMI watchdog running (0x13003c) where I do: perf record -a -C 1 -e cycles:k -ecycles:u -F 10 -- sleep 10 --
Well, it wouldn't be too hard to cure that, but the BIOS should simply keep its grubby paws of the PMU -- I'm really not interested in Right, we could easily revert to scanning all counters like we do for Right, the code pretty much assumes that if it overflows a PMI will be generated. So you're saying a pending PMI might get canceled when we clear the EN bit? Most icky. --
Well, that's the question! Looks like this may be more reliable, yet more costly. And also you'd have to deal with PEBS separately, though using No, I don't think that cancels it. But that may be a reason why there are back-to-back NMIs, with nothing to process sometimes (event not in the active_mask anymore). --
Hmm, the test case I used was 'perf top'. This was only using perf counter0. So I am not sure, at least in my case, it was a stale event. Though I don't think I remember checking the status immediately after acking it just to verify the ack worked. I'll poke some more on my machine today. Cheers, Don --
I am not acking it right away, this is still without patch 4/4. I am trying to --
| Greg KH | Og dreams of kernels |
| Jens Axboe | [PATCH 31/33] Fusion: sg chaining support |
| Arnd Bergmann | Re: finding your own dead "CONFIG_" variables |
| Mark Brown | [PATCH 2/2] Subject: natsemi: Allow users to disable workaround for DspCfg reset |
| Tony Breeds | [LGUEST] Look in object dir for .config |
git: | |
| Brian Downing | Re: Git in a Nutshell guide |
| John Benes | Re: master has some toys |
| Matthias Lederhofer | [PATCH 4/7] introduce GIT_WORK_TREE to specify the work tree |
| Alexander Sulfrian | [RFC/PATCH] RE: git calls SSH_ASKPASS even if DISPLAY is not set |
| Junio C Hamano | Re: Rss produced by git is not valid xml? |
| Linux Kernel Mailing List | iSeries: fix section mismatch in iseries_veth |
