Thread (6 messages) 6 messages, 3 authors, 2012-01-30

oprofile and ARM A9 hardware counter

From: stephane eranian <hidden>
Date: 2012-01-30 20:45:44
Also in: linux-omap

On Mon, Jan 30, 2012 at 8:14 PM, Will Deacon [off-list ref] wrote:
On Mon, Jan 30, 2012 at 05:45:19PM +0000, stephane eranian wrote:
quoted
There you go, no attachment, not sure the omap list
supports this.
Cheers Stephane.
quoted
There is something quite interesting to observe.

While I run perf record -e cycles -F 100 noploop 10, I watch
/proc/interrupts. The number of interrupts is way lower than
expected. Therefore the number of samples is way too low:

$ perf record -e cycles -F 100 noploop 10
$ perf report -D | tail -20
cycles stats:
? ? ? ? ? ?TOTAL events: ? ? ? ?535
? ? ? ? ? ? MMAP events: ? ? ? ? 11
? ? ? ? ? ? COMM events: ? ? ? ? ?2
? ? ? ? ? ? EXIT events: ? ? ? ? ?2
? ? ? ? ? SAMPLE events: ? ? ? ?520

The delta in /proc/interrupts on CPU1 is 520 interrupts.
Yes, that is about half of what you'd expect. Running on my A9 platform
(vexpress) I get:

$ perf record -e cycles -F 100 noploop 10
$ perf report -D | tail -20
cycles stats:
? ? ? ? ? TOTAL events: ? ? ? 1007
? ? ? ? ? ?MMAP events: ? ? ? ? 18
? ? ? ? ? ?COMM events: ? ? ? ? ?2
? ? ? ? ? ?EXIT events: ? ? ? ? ?2
? ? ? ? ?SAMPLE events: ? ? ? ?985
quoted
So looks like the frequency adjustment which is hooked off of the
timer tick is either not called at each timer tick, the timer ticks are
not at regular interval, or the math is wrong.
My hunch is that that the interval is probably varying, but I don't know much
about OMAP4 and its clocks.
Glad you tested this. At least, it seems the generic perf_event code
is allright.
I agree with you, something is fishy with the clocks. Just out of
curiosity, what is
the HZ value for your board? On my Panda it's 128Hz.
quoted
If I go with the fixed period mode:
$ perf stat -e cycles noploop 10
noploop for 10 seconds
?Performance counter stats for 'noploop 10':
? ? ? ?10079156960 cycles ? ? ? ? ? ? ? ? ? ?# ? ?0.000 GHz
? ? ? 10.004547117 seconds time elapsed

That means, if I want 100 samples/sec: = 10079156960/(10*100)=10079157
$ perf record -e cycles -c 10079157 noploop 10
$ perf report -D | tail -20
cycles stats:
? ? ? ? ? ?TOTAL events: ? ? ? 1003
? ? ? ? ? ? MMAP events: ? ? ? ? 11
? ? ? ? ? ? COMM events: ? ? ? ? ?2
? ? ? ? ? ? EXIT events: ? ? ? ? ?2
? ? ? ? THROTTLE events: ? ? ? ? ?1
? ? ? UNTHROTTLE events: ? ? ? ? ?1
? ? ? ? ? SAMPLE events: ? ? ? ?986

Now, we're getting the right answer!
Just to confirm, for me:

$ perf stat -e cycles ./noploop 10
noploop for 10 seconds

?Performance counter stats for './noploop 10':

? ? ? ?4001163930 cycles ? ? ? ? ? ? ? ? ? ?# ? ?0.000 GHz

? ? ?10.006534024 seconds time elapsed

$ perf record -e cycles -c 4001163 ./noploop 10
$ perf report -D | tail -20
?Aggregated stats:
? ? ? ? ? TOTAL events: ? ? ? 1020
? ? ? ? ? ?MMAP events: ? ? ? ? 18
? ? ? ? ? ?COMM events: ? ? ? ? ?2
? ? ? ? ? ?EXIT events: ? ? ? ? ?2
? ? ? ? ?SAMPLE events: ? ? ? ?998
cycles stats:
? ? ? ? ? TOTAL events: ? ? ? 1020
? ? ? ? ? ?MMAP events: ? ? ? ? 18
? ? ? ? ? ?COMM events: ? ? ? ? ?2
? ? ? ? ? ?EXIT events: ? ? ? ? ?2
? ? ? ? ?SAMPLE events: ? ? ? ?998

which is close enough :)
quoted
We need to elucidate what's going on in perf_event_task_tick().
I have tried with my throttling fix and it did not help. We are
not subject to throttling with such a low rate.
Ok. I would start by looking at the clock ticks if I were you, since this
seems to be alright on my board.

Will
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help