Thread (6 messages) 6 messages, 2 authors, 2d ago

Re: [PATCH] tracing: eprobe: read the complete FILTER_PTR_STRING pointer

From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Date: 2026-06-18 01:52:30
Also in: lkml

On Wed, 17 Jun 2026 10:32:17 +0200
Martin Kaiser [off-list ref] wrote:
Hiramatsu-san,

thank you for reviewing my patch.

Thus wrote Masami Hiramatsu (mhiramat@kernel.org):
quoted
Ah, this is a bit complicated. It seems to work with sched_switch event
as commit f04dec93466a ("tracing/eprobes: Fix reading of string fields"):
quoted
echo 'e:sw sched/sched_switch comm=$next_comm:string' > dynamic_events
quoted
#           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
#              | |         |   |||||     |         |
              sh-162     [002] d..3.    54.027213: sw: (sched.sched_switch) comm="swapper/2"
          <idle>-0       [007] d..3.    54.034573: sw: (sched.sched_switch) comm="rcu_preempt"
     rcu_preempt-15      [007] d..3.    54.034589: sw: (sched.sched_switch) comm="swapper/7"
quoted
Maybe comm is stored as a fixed string information in the event record?
Yes, this example does not execute my change.
quoted
/sys/kernel/tracing # cat events/sched/sched_switch/format 
name: sched_switch
ID: 254
format:
	field:unsigned short common_type;	offset:0;	size:2;	signed:0;
	field:unsigned char common_flags;	offset:2;	size:1;	signed:0;
	field:unsigned char common_preempt_count;	offset:3;	size:1;	signed:0;
	field:int common_pid;	offset:4;	size:4;	signed:1;
quoted
	field:char prev_comm[16];	offset:8;	size:16;	signed:0;
	field:pid_t prev_pid;	offset:24;	size:4;	signed:1;
	field:int prev_prio;	offset:28;	size:4;	signed:1;
	field:long prev_state;	offset:32;	size:8;	signed:1;
	field:char next_comm[16];	offset:40;	size:16;	signed:0;
	field:pid_t next_pid;	offset:56;	size:4;	signed:1;
	field:int next_prio;	offset:60;	size:4;	signed:1;
quoted
But the filename is a pointer.
quoted
/sys/kernel/tracing # cat events/syscalls/sys_enter_openat/format 
name: sys_enter_openat
ID: 705
format:
	field:unsigned short common_type;	offset:0;	size:2;	signed:0;
	field:unsigned char common_flags;	offset:2;	size:1;	signed:0;
	field:unsigned char common_preempt_count;	offset:3;	size:1;	signed:0;
	field:int common_pid;	offset:4;	size:4;	signed:1;
quoted
	field:int __syscall_nr;	offset:8;	size:4;	signed:1;
	field:int dfd;	offset:16;	size:8;	signed:0;
	field:const char * filename;	offset:24;	size:8;	signed:0;
	field:int flags;	offset:32;	size:8;	signed:0;
	field:umode_t mode;	offset:40;	size:8;	signed:0;
	field:__data_loc char[] __filename_val;	offset:48;	size:4;	signed:0;
quoted
In this case, the filename field should use __data_loc directly instead of
pointing data on the ring buffer.
quoted
Can you try 
quoted
echo 'e syscalls.sys_enter_openat $__filename_val:string' > \
 		/sys/kernel/tracing/dynamic_events
quoted
Instead?
This field is working as expected.

I still believe that the handling of FILTER_PTR_STRING is not correct. The
pointer is stored in the ringbuffer as unsigned long and read as a char. This
gives us a truncated pointer that cannot be dereferenced.
Ah, OK. I understand the problem.

 - ring buffer and its records should be self-contained.
 - In most cases, events use __data_loc/__rel_loc or fixed array to store
   strings.
 - only syscall events exposes the char *, which is not recommended but
   important to debug user space. (not for dereference)

The example usage of FILTER_PTR_STRING is actually using FILTER_STATIC_STRING
now, so FILTER_PTR_STRING is left broken. (hmm, but there are many
 "const char *" are used especially under rcu events...)

OK, can you update your patch description to use rcu events?

BTW, I think those also should be decoded from enum value in the events,
or use __rel_loc. Since it is not self-contained. (it's a TODO item)
quoted
I think better solution is fixing sycall tracer.
I would say that syscall trace is doing the right thing. The ringbuffer entry
is a struct syscall_trace_enter, the syscall arguments are unsigned longs.
They are written in ftrace_syscall_enter, this looks correct to me.
OK, I thought the filename points the ringbuffer, but it actually points
the user space. (saving a raw parameter values) So it is OK.

For eprobe users, it should not access to the user space data directly
because it can cause page fault in the kernel without fixup. It may work
on x86, but it doesn't work on other architecture which has separated
address space for user space. To avoid such mistake, it saves actual
string in the ringbuffer as __filename_val.

Hmm, this must be documented in eprobe example code...
A const char * syscall argument is using FILTER_PTR_STRING, the unsigned long
argument from the ringbuffer is read as a char and then converted to a
truncated pointer.

Thanks,

-- 
Masami Hiramatsu (Google) [off-list ref]
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help