Thread (48 messages) 48 messages, 3 authors, 2015-10-02

[RFC PATCH 14/20] coresight: etm-perf: implementing 'event_init()' API

From: alexander.shishkin@linux.intel.com (Alexander Shishkin)
Date: 2015-09-30 09:45:14
Also in: lkml

Mathieu Poirier [off-list ref] writes:
On 22 September 2015 at 08:29, Alexander Shishkin
[off-list ref] wrote:
quoted
Mathieu Poirier [off-list ref] writes:
quoted
+static void etm_event_destroy(struct perf_event *event)
+{
+     /* switching off the source will also tear down the path */
+     etm_event_power_sources(event->cpu, false);
+}
+
+static int etm_event_init(struct perf_event *event)
+{
+     int ret;
+
+     if (event->attr.type != etm_pmu.type)
+             return -ENOENT;
+
+     if (event->cpu >= nr_cpu_ids)
+             return -EINVAL;
+
+     /* only one session at a time */
+     if (etm_event_source_enabled(event->cpu))
+             return -EBUSY;
Why is this the case? If you were to configure the event in pmu::add()
and deconfigure it in pmu::del(), like you already do with the buffer
part, you could handle as many sessions as you want.
Apologies for the late reply, I was travelling.

We certainly don't want to have more than once trace session going on
at any given time, especially if the sessions have different
configuration parameters.  Moreover doing the tracer configuration as
part of pmu::add() is highly redundant.
But why?

The whole point of using perf for this is that it does all the tricky
context switching for us, all the cross-cpu calling to enable/disable
the events etc so that we can run multiple sessions in parallel without
having to worry (much) about scheduling. (Aside, of course, from other
useful things like sideband events, but that's another topic).
quoted
This can be done in pmu::add(), if you can call directly into
etm_configure_cpu() or etm_config_enable() so that there's no cross-cpu
calling in between.
As per my comment above, reconfiguring the tracers every time it is
about to run is redundant and extensive (etm_configure_cpu() isn't
exactly short),  incurring a cost that is likely to be higher than
calling get_online_cpus().
I was actually referring to synchronous smp_function_call*()s that
obviously won't work here. But the good news is that they are also
redundant.

But I don't see anything expensive in configuring etm and etb in
pmu::add(), as far as I can tell, it's just a bunch of register
writes. If you want to optimize those, you could compare the new context
against the previous one and only update registers that need to be
updated. The spinlock you also could get rid of, because there won't be
any local racing (again, afaict neither ETM nor ETB generate
interrupts).

That said, one expensive thing is reading out the ETB buffer on every
sched out, and that is the real problem, because it slows down the fast
path by a loop of arbitrary length reading out hw registers. Iirc, ETBs
could be up to 64K?

But a TMC-enabled coresight should do much better in this regard.

Thanks,
--
Alex
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help