On Fri, 2008-04-04 at 08:38 +0200, Arnd Bergmann wrote:
Our first thought to fix the bug was to just grab the mutex lock when
adding the switch notification data to the queue. The kernel gives us
an oops message saying something along the line of "could not call mutex
lock in interrupt context". Hence we had to go to work queues so we
could access the lock outside of the SPU switch notification context.
Secondly, it is my understanding that if the schedule_work() call tries
to queue the same work function multiple times the subsequent requests
are dropped. Thus we were not able to pass the context switch data as
part of the schedule work requests. This forced us to have an array to
store the data for each SPU.
Until the OProfile sees the context switch record, it does not know what
to do with the PC samples and just drops them. The thought was using a
private work queue might help get the context switch records processed a
little earlier. It probably doesn't make that much difference. I can
just use the generic work queue.
The goal is to be able to support very high sampling rates (small
periods). The schedule_delayed_work() is based on jiffies which I
believe is 1/250 for this machine. This only gives millisecond
resolution. The goal is for the users to be able to specify a time
period of 60,000 cycles or less then 20 micro second sampling periods
when the real high resolution timers are available. We can't achieve
the desired sampling rates with the schedule_dealyed_work() function.
Oprofile provides nice clean interfaces for recording kernel/user
switches and CPU data recording. This is all that was needed by any
architecture until CELL came along. With CELL, we now have need to add
processor data plus SPU data to the queue. The buffer_mutex variable
and the add_event_entry() were not visible outside of the OProfile
driver code. The original SPU support added add_event_entry() to the
include/linux/oprofile.h file. We can add the buffer_mutex as well
since there is now a need to access both of these.
I have been looking to see how I could create a generic oprofile routine
which could take the data. The routine would still have to work from an
interrupt context, so it will need to store the data and call a work
queue function. The function would need to know how much data will be
needed, thus you would probably need to statically allocate data or use
a list and malloc the data as needed. I don't really want to have to
malloc data from an interrupt context. List management adds additional
overhead. It would be possible to have an init function that you could
call at startup time telling it how much memory you need, in this case
we could allocate a buffer the size of spu_info (defined below) at
startup time. The call could pass an array to the OProfile routine that
would put the data into the buffer and call the work function. We still
have to allocate the storage, it does clean up the arch specific code.
Not sure if this really buys us much. There is more copying of data
i.e. more overhead. Not convinced the OProfile maintainers would accept
anything I have thought of so far.
Any suggestions?
The spu_context_switch_data structure is the type of the array used in
struct spus_cntxt_sw_data_s which is the data used by the work queue.
Yes, spus_profiling_code_data seems to be unused.
Yes, it was recognized that we could overwrite data. The expectation is
that the workqueue function will run fairly quickly. If the SPU context
was only loaded for a very short period of time at most a few samples
would be captured for that frame so the error would be down in the
noise. OProfile is a statistical tool. If we don't get enough samples
for the spu context, then the data is not statistically valid. Losing
the context switch is not an issue. The context must be loaded long
enough that we collect a reasonable number of samples for the output to
be meaningful.
It was not dropped. It was implemented. The issue that I have is the
dcookie spu_cookie, app_dcookie, offset, objectId is not included in the
local cahced spu context data. Previously, there was no need to save it
since it was being immediatly written to oprofile buffer (without
holding the mutex lock). Now we need to store the data until the
workqueue function runs. We can store it in the array as I have done or
you could put it in the spu context. Functionally, it doesn't change
anything. The data in the SPU context would get overwritten, just as it
does in the array, if there was an SPU context switch before the
workqueue function runs so that doesn't solve that issue.
--