Re: [POC][RFC][PATCH 0/3] tracing: Add perf events to trace buffer

From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Date: 2025-11-18 08:11:53
Also in: lkml

On Mon, 17 Nov 2025 22:42:27 -0500
Steven Rostedt [off-list ref] wrote:

quoted

As this will eventual work with many more perf events than just cache-misses
and cpu-cycles , using options is not appropriate. Especially since the
options are limited to a 64 bit bitmask, and that can easily go much higher.
I'm thinking about having a file instead that will act as a way to enable
perf events for events, function and function graph tracing.

  set_event_perf, set_ftrace_perf, set_fgraph_perf

What about adding a global `trigger` action file so that user can
add these "perf" actions to write into it. It is something like
stacktrace for events. (Maybe we can move stacktrace/user-stacktrace
into it too)

For pre-defined/software counters:
# echo "perf:cpu_cycles" >> /sys/kernel/tracing/trigger

For events, it would make more sense to put it into the events directory:

 # echo "perf:cpu_cycles" >> /sys/kernel/tracing/events/trigger

As there is already a events/enable

Heck we could even add it per system:

 # echo "perf:cpu_cycles" >> /sys/kernel/tracing/events/syscalls/trigger

Yes, this will be very useful!

quoted

For some hardware event sources (see /sys/bus/event_source/devices/):
# echo "perf:cstate_core.c3-residency" >> /sys/kernel/tracing/trigger

echo "perf:my_counter=pmu/config=M,config1=N" >> /sys/kernel/tracing/trigger

Still need a way to add an identifier list. Currently, if the size of
the type identifier is one byte, then it can only support up to 256 events.

Yes, so if user adds more than that, it will return -ENOSPC.

Do we need every event for this? Or just have a subset of events that
would be supported?

For the event tracing, maybe those are used as measuring delta between
paired events. For such use case, user may want to set it only on those
events.

quoted

If we need to set those counters for tracers and events separately,
we can add `events/trigger` and `tracer-trigger` files.

As I mentioned, the trigger for events should be in the events directory.

Agreed.

We could add a ftrace_trigger that can affect both function and
function graph tracer.

Got it.

quoted

echo "perf:cpu_cycles" >> /sys/kernel/tracing/events/trigger

To disable counters, we can use '!' as same as event triggers.

echo !perf:cpu_cycles > trigger

Yes, it would follow the current way to disable a trigger.

quoted

To add more than 2 counters, connect it with ':'.
(or, we will allow to append new perf counters)
This allows user to set perf counter options for each events.

Maybe we also should move 'stacktrace'/'userstacktrace' option
flags to it too eventually.

We can add them, but may never be able to remove them due to backward
compatibility.

Ah, indeed.

quoted

And an available_perf_events that show what can be written into these files,
(similar to how set_ftrace_filter works). But for now, it was just easier to
implement them as options.

As for the perf event that is triggered. It currently is a dynamic array of
64 bit values. Each value is broken up into 8 bits for what type of perf
event it is, and 56 bits for the counter. It only writes a per CPU raw
counter and does not do any math. That would be needed to be done by any
post processing.

Since the values are for user space to do the subtraction to figure out the
difference between events, for example, the function_graph tracer may have:

             is_vmalloc_addr() {
               /* cpu_cycles: 5582263593 cache_misses: 2869004572 */
               /* cpu_cycles: 5582267527 cache_misses: 2869006049 */
             }

Just a style question: Would this mean the first line is for function entry
and the second one is function return?

Yes.

Perhaps we could add field to the perf event to allow for annotation,
so the above could look like:

              is_vmalloc_addr() {
               /* --> cpu_cycles: 5582263593 cache_misses: 2869004572 */
               /* <-- cpu_cycles: 5582267527 cache_misses: 2869006049 */
             }  

Or something similar?

Yeah, it looks more readable.

Thank you!

-- 
Masami Hiramatsu (Google) [off-list ref]

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help