Re: [PATCH] tracing: eprobe: read the complete FILTER_PTR_STRING pointer
From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Date: 2026-06-18 01:52:30
Also in:
lkml
On Wed, 17 Jun 2026 10:32:17 +0200 Martin Kaiser [off-list ref] wrote:
Hiramatsu-san, thank you for reviewing my patch. Thus wrote Masami Hiramatsu (mhiramat@kernel.org):quoted
Ah, this is a bit complicated. It seems to work with sched_switch event as commit f04dec93466a ("tracing/eprobes: Fix reading of string fields"):quoted
echo 'e:sw sched/sched_switch comm=$next_comm:string' > dynamic_eventsquoted
# TASK-PID CPU# ||||| TIMESTAMP FUNCTION # | | | ||||| | | sh-162 [002] d..3. 54.027213: sw: (sched.sched_switch) comm="swapper/2" <idle>-0 [007] d..3. 54.034573: sw: (sched.sched_switch) comm="rcu_preempt" rcu_preempt-15 [007] d..3. 54.034589: sw: (sched.sched_switch) comm="swapper/7"quoted
Maybe comm is stored as a fixed string information in the event record?Yes, this example does not execute my change.quoted
/sys/kernel/tracing # cat events/sched/sched_switch/format name: sched_switch ID: 254 format: field:unsigned short common_type; offset:0; size:2; signed:0; field:unsigned char common_flags; offset:2; size:1; signed:0; field:unsigned char common_preempt_count; offset:3; size:1; signed:0; field:int common_pid; offset:4; size:4; signed:1;quoted
field:char prev_comm[16]; offset:8; size:16; signed:0; field:pid_t prev_pid; offset:24; size:4; signed:1; field:int prev_prio; offset:28; size:4; signed:1; field:long prev_state; offset:32; size:8; signed:1; field:char next_comm[16]; offset:40; size:16; signed:0; field:pid_t next_pid; offset:56; size:4; signed:1; field:int next_prio; offset:60; size:4; signed:1;quoted
But the filename is a pointer.quoted
/sys/kernel/tracing # cat events/syscalls/sys_enter_openat/format name: sys_enter_openat ID: 705 format: field:unsigned short common_type; offset:0; size:2; signed:0; field:unsigned char common_flags; offset:2; size:1; signed:0; field:unsigned char common_preempt_count; offset:3; size:1; signed:0; field:int common_pid; offset:4; size:4; signed:1;quoted
field:int __syscall_nr; offset:8; size:4; signed:1; field:int dfd; offset:16; size:8; signed:0; field:const char * filename; offset:24; size:8; signed:0; field:int flags; offset:32; size:8; signed:0; field:umode_t mode; offset:40; size:8; signed:0; field:__data_loc char[] __filename_val; offset:48; size:4; signed:0;quoted
In this case, the filename field should use __data_loc directly instead of pointing data on the ring buffer.quoted
Can you tryquoted
echo 'e syscalls.sys_enter_openat $__filename_val:string' > \ /sys/kernel/tracing/dynamic_eventsquoted
Instead?This field is working as expected. I still believe that the handling of FILTER_PTR_STRING is not correct. The pointer is stored in the ringbuffer as unsigned long and read as a char. This gives us a truncated pointer that cannot be dereferenced.
Ah, OK. I understand the problem. - ring buffer and its records should be self-contained. - In most cases, events use __data_loc/__rel_loc or fixed array to store strings. - only syscall events exposes the char *, which is not recommended but important to debug user space. (not for dereference) The example usage of FILTER_PTR_STRING is actually using FILTER_STATIC_STRING now, so FILTER_PTR_STRING is left broken. (hmm, but there are many "const char *" are used especially under rcu events...) OK, can you update your patch description to use rcu events? BTW, I think those also should be decoded from enum value in the events, or use __rel_loc. Since it is not self-contained. (it's a TODO item)
quoted
I think better solution is fixing sycall tracer.I would say that syscall trace is doing the right thing. The ringbuffer entry is a struct syscall_trace_enter, the syscall arguments are unsigned longs. They are written in ftrace_syscall_enter, this looks correct to me.
OK, I thought the filename points the ringbuffer, but it actually points the user space. (saving a raw parameter values) So it is OK. For eprobe users, it should not access to the user space data directly because it can cause page fault in the kernel without fixup. It may work on x86, but it doesn't work on other architecture which has separated address space for user space. To avoid such mistake, it saves actual string in the ringbuffer as __filename_val. Hmm, this must be documented in eprobe example code...
A const char * syscall argument is using FILTER_PTR_STRING, the unsigned long argument from the ringbuffer is read as a char and then converted to a truncated pointer.
Thanks, -- Masami Hiramatsu (Google) [off-list ref]