Re: [PATCH 4/4] [x86] perf: fix accidentally ack'ing a second event on intel perf counter

Previous thread: Issues in Linux HPET example by Jaswinder Singh Rajput on Wednesday, September 1, 2010 - 5:47 am. (2 messages)

Next thread: [PATCH 1/6] staging/comedi: fix syntax error by =?UTF-8?q?Uwe=20Kleine-K=C3=B6nig?= on Wednesday, September 1, 2010 - 6:40 am. (6 messages)
From: Stephane Eranian
Date: Wednesday, September 1, 2010 - 6:04 am

Don,

Found your patch on LKML (I am not on it).


I am assuming you're talking about back-to-back NMIs here, not nested NMIs.
I don't quite understand the scenario here. Is it the case that you handled 1
overflow, and then right as you return from the interrupt, you get a second
PMI with a ovfl_status=0 ?

What events did you measure? Which counters did you use?
--

From: Robert Richter
Date: Wednesday, September 1, 2010 - 7:57 am

It is related to this thread:

 http://lkml.org/lkml/2010/8/25/124

Not acking the status immediately triggered an nmi, but the status was
0. Acking after reading and before processing the counters results in
a non-zero status and thus, no empty nmi.


-- 
Advanced Micro Devices, Inc.
Operating System Research Center

--

From: Stephane Eranian
Date: Thursday, September 2, 2010 - 1:13 am

Robert,

Do you have the test program you used to test this?
I believe the NHM hack does not solve the problem, it
just makes it harder to appear.

I suspect the real issue is that the GLOBAL_STATUS
bitmask cannot be trusted. I'd like to verify this.

Has the problem appear only on Nehalem or also on
Westmere?


--

From: Robert Richter
Date: Thursday, September 2, 2010 - 6:11 am

For testing back-to-back nmis I have used:

 perf record -e cycles -e instructions -e cache-references
       -e cache-misses -e branch-misses -a -- sleep 10

with load on all cpus. But I couldn't reproduce this particular
problem as I do not have such a system available. I think it might
trigger also with only one counter running. What the observed from the

So yes, it looks like it is a cpu bug with a race then clearing the

I don't know.

-Robert

-- 
Advanced Micro Devices, Inc.
Operating System Research Center

--

From: Don Zickus
Date: Thursday, September 2, 2010 - 7:19 am

I was able to duplicate on

Intel(R) Core(TM) i5 CPU         650  @ 3.20GHz
Intel(R) Xeon(R) CPU           X5560  @ 2.80GHz

with just running 'perf top' for about 60 seconds.

You would need the first three patches to expose the problem.

Reading the code, it seemed like the perf counters should be disabled and
this patch should be unecessary, but after playing around with the code
for a few hours, I came up with this patch to trap the issue.

I read through the cpu errata and could not find anything related but I
might have missed something.

I am willing to help test if you have a more targeted patch.

Cheers,
Don
--

From: Stephane Eranian
Date: Thursday, September 2, 2010 - 7:39 am

Don,

I managed to reproduce on core i7 860 (without patch4).
Looking at the code again, I am dubious you ever execute
the retry goto. If the PMU is disabled and you've just
cleared the OVF_STAT, then I don't see where the new
overflows would come from. But that's a separate problem.

One thing I did is to compare status obtained via OVFL_STATUS
with one that I build manually by inspecting each individual
counter. The two returned bitmasks should always be identical
(with PEBS disabled).  When I got the spurious NMI, it did not
trip my status validation. So the OVFL_STATUS is valid.

I found something else that looked fishy. I am experimenting
Thanks.
--

From: Don Zickus
Date: Thursday, September 2, 2010 - 8:47 am

I agree with you, but a printk before the goto proved otherwise! :-)

And a printk of the status bit that triggered the goto happened to be the
same one that we initially cleared.  Like I said when I initially posted
the patch, I am not sure why it works but it does do something to stem the
NMI.

There is probably a deeper problem here, I was just trying to get the
external/unknown nmis working again.

Cheers,
Don
--

From: Stephane Eranian
Date: Thursday, September 2, 2010 - 9:18 am

Well, I was wrong about my status validation. I had my test wrong.
I am seeing discrepancies between OVF_STATUS (status)
and manual inspection of each counter (status2) compared to
active_mask (act):

[  489.338564] CPU2 status=0x6 status2=0x4 act=0x7
[  501.484235] CPU2 status=0x6 status2=0x4 act=0x7
[  541.686414] CPU2 status=0x6 status2=0x4 act=0x7
[  561.819575] CPU2 status=0x6 status2=0x4 act=0x7

Note that I have the NMI watchdog on, and running perf record
with 2 events only.

OVF_STATUS reports more than what has actually occurred.
I think that is fine because in the loop there is a check for against
active_mask. Nevertheless, I think there is a problem right there.

I am trying to narrow down when it happens. It seems to occur
only when I start a new perf record session. And possibly when
perf runs on a CPU that is not monitored:

taskset -c 4   perf record -e cycles -e instructions -F 2000 -a  -C 2

Yes, I suspect the clearing does not work well. Let's try to do it
multiple times
I suspect that if we were not to use OVF_STATUS, then we would not need
the back-to-back nmi logic. NMi cannot be nested. So just looking at
the counters
--

From: Peter Zijlstra
Date: Friday, September 3, 2010 - 1:33 am

One thing we still need to do is on init detect if the BIOS is using one
of the PMCs and simply disable all of perf and print a nice big message
to the user to request a new BIOS from their vendor.


--

From: Stephane Eranian
Date: Friday, September 3, 2010 - 4:02 am

Given then way perf_events operate, that is your only choice at this point.

But I am sure neither my system nor yours is subject to this particular issue
yet there is some unexplained errors with OVF_STATUS.

Here is an example of what I gathered on a Westmere:

This is coming into the interrupt handler:
- status   = overflow status coming from GLOBAL_OVF_STATUS
- status2 = inspection of the counters
- act = cpuc->active_mask[0]

In case both status don't match, I dump the state of the active events
incl. the counter values(val).

[  822.813808] CPU2 irqin status=0x6 status2=0x4 act=0x7
[  822.813818] CPU2 cfg=0x13003c idx=0 sel=53003c val=ffffa833f298
[  822.813821] CPU2 cfg=0x12003c idx=1 sel=52003c val=fffffe130229
[  822.813823] CPU2 cfg=0x11003c idx=2 sel=51003c val=5e9

Here only counter2 has overflowed, yet the handler will also process counter1
which is wrong.

The other thing I noticed is that in intel_pmu_disable_event(), the event
stopped sometimes has overflowed. Looks like OVF_STAUS is stale.
Maybe OVF_STATUS is not cleared properly somewhere, possibly when
an event gets disabled.

I have a busy system, with the NMI watchdog running (0x13003c) where I do:

perf record -a -C 1 -e cycles:k -ecycles:u -F 10 -- sleep 10
--

From: Peter Zijlstra
Date: Friday, September 3, 2010 - 4:11 am

Well, it wouldn't be too hard to cure that, but the BIOS should simply
keep its grubby paws of the PMU -- I'm really not interested in



Right, we could easily revert to scanning all counters like we do for

Right, the code pretty much assumes that if it overflows a PMI will be
generated. So you're saying a pending PMI might get canceled when we
clear the EN bit? Most icky.
--

From: Stephane Eranian
Date: Friday, September 3, 2010 - 4:52 am

Well, that's the question! Looks like this may be more reliable, yet more
costly. And also you'd have to deal with PEBS separately, though using
No, I don't think that cancels it. But that may be a reason why there are
back-to-back NMIs, with nothing to process sometimes (event not in the
active_mask anymore).
--

From: Don Zickus
Date: Friday, September 3, 2010 - 7:03 am

Hmm, the test case I used was 'perf top'.  This was only using perf
counter0.  So I am not sure, at least in my case, it was a stale event.
Though I don't think I remember checking the status immediately after
acking it just to verify the ack worked.

I'll poke some more on my machine today.

Cheers,
Don
--

From: Stephane Eranian
Date: Friday, September 3, 2010 - 7:28 am

I am not acking it right away, this is still without patch 4/4. I am trying to
--

Previous thread: Issues in Linux HPET example by Jaswinder Singh Rajput on Wednesday, September 1, 2010 - 5:47 am. (2 messages)

Next thread: [PATCH 1/6] staging/comedi: fix syntax error by =?UTF-8?q?Uwe=20Kleine-K=C3=B6nig?= on Wednesday, September 1, 2010 - 6:40 am. (6 messages)