Thread (15 messages) 15 messages, 2 authors, 2021-10-19

Re: [PATCH v10 4/5] arm64: perf: Enable PMU counter userspace access for perf event

From: Mark Rutland <mark.rutland@arm.com>
Date: 2021-10-19 15:28:43
Also in: linux-perf-users, lkml

Hi Rob,

On Thu, Oct 14, 2021 at 02:24:46PM -0500, Rob Herring wrote:
On Thu, Oct 14, 2021 at 11:58 AM Mark Rutland [off-list ref] wrote:
quoted
On Tue, Sep 14, 2021 at 03:47:59PM -0500, Rob Herring wrote:
For the `config1 = 3` case (potentially) overriding the usual long
semantic, I'm struggling to understand why we need that rather than
forcing the use of a 64-bit counter, because in that case:

* For a CPU_CYCLES event:
  __armv8_pmuv3_map_event() will always pick 64-bits
  get_event_idx() may fail to allocate a 64-bit counter.

* For other events:
  __armv8_pmuv3_map_event() will pick 32/64 based on long counter
  support
  get_event_idx() will only fail if there are no counters free.

Whereas if __armv8_pmuv3_map_event() returned an error for the latter
when long counter support is not implemented, we'd have consistent
`long` semantics, and the CPU_CYCLES behaviour would be identical.

What's the rationale for `3` leaving the choice to the kernel?
It's the give me the maximum sized counter the h/w can support choice.
That's easier for userspace to implement. Bit 1 is more of a hint that
the user wants userspace access rather than a requirement.
quoted
If the problem is discoverability, I'd be happy to add something to
sysfs to describe whether the PMU has long event support.
Checking sysfs or a try for 64-bit support then fall back to 32-bit
support isn't much difference.

Keep in mind that x86 always succeeds here. Every userspace user will
have to add whatever dance we create here. For example, each libperf
test with user access (there's only 2 in my tree, but there's a series
adding more) has to have an '#ifdef __aarch64__' for whatever we do
here. I was seeking to minimize that. Right now, that's just a set
config1 to 0x3. Also, note that libperf will opportunistically use a
userspace read instead of read(). The user just has to mmap the event
and libperf will use a userspace read when enabled which ultimately
depends on what the mmapped page says.
I think that x86 always succeeding here is more of a legacy thing that
they're stuck with rather than a design to be copied.

I'd prefer to keep the existing meaning of the `long` flag to mean "give
me 64 bits of counter, somehow", with `rdpmc` meaning "give me a single
counter I can access from userspace", even if that means the combination
of the two can sometimes be rejected. As you say, we can probe for that
as necessary by trying `long` then falling back to a plain event, and if
that ends up being a bottleneck somehow we can figure out a way of
advertising support to userspace. Regardless, we should 

Importantly, I don't think libperf should override a user's request for
`long`, since the user may want to optimize for minimal perturbation
rather than faster access.

If we want a "please give me the longest counter that's compatible with
other constraints", I think that should be a new flag e.g. `trylong`,
and shouldn't override the existing `long`. We can add that as a
follow-up if we want it.

Thanks,
Mark.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help