Re: [RFC PATCH 1/2] tracing: fixes of ftrace_enable_fops
From: Gabriele Paoloni <hidden>
Date: 2025-06-19 17:07:49
Also in:
lkml
Hi Masami On Fri, Jun 13, 2025 at 4:45 AM Masami Hiramatsu [off-list ref] wrote:
On Thu, 12 Jun 2025 12:43:48 +0200 Gabriele Paoloni [off-list ref] wrote:quoted
Currently there are different issues associated with ftrace_enable_fops - event_enable_write: *ppos is increased while not used at all in the write operation itself (following a write, this could lead a read to fail or report a corrupted event status);Here, we expected the "enable" file is a pseudo text file. So if there is a write, the ppos should be incremented.quoted
- event_enable_read: cnt < strlen(buf) is allowed and this can lead to reading an incomplete event status (i.e. not all status characters are retrieved) and/or reading the status in a non-atomic way (i.e. the status could change between two consecutive reads);As I said, the "enable" file is a kind of text file. So reader must read it until EOF. If you need to get the consistent result, user should use the enough size of buffer.
What I am concerned about are scenarios like the one below:
---
# strace -Tfe trace=openat,open,read,write scat 1
/sys/kernel/tracing/events/kprobes/ev1/enable
open("/sys/kernel/tracing/events/kprobes/ev1/enable",
O_RDONLY|O_LARGEFILE) = 3 <0.000237>
Open /sys/kernel/tracing/events/kprobes/ev1/enable ->fd=3
read fd=3, 1
read(3, "0", 1) = 1 <0.000099>
1 bytes Read
30,
read(3, "\n", 1) = 1 <0.000095>
1 bytes Read
0a,
read(3, "", 1) = 0 <0.000102>
close fd=3+++ exited with 0 +++---
So in this case there are 2 consecutive reads byte by byte that
could lead to inconsistent results if in the meantime the event
status has changed.
With the proposed patchset the same test would result in a failure
as per log below:
---
# strace -Tfe trace=openat,open,read,write scat 1
/sys/kernel/tracing/events/kprobes/ev1/enable
open("/sys/kernel/tracing/events/kprobes/ev1/enable",
O_RDONLY|O_LARGEFILE) = 3 <0.000227>
Open /sys/kernel/tracing/events/kprobes/ev1/enable ->fd=3
read fd=3, 1
read(3, 0x7ffd960234e0, 1) = -1 EINVAL (Invalid argument)
<0.000228>
close fd=3+++ exited with 0 +++--- On the other side the proposed patchset would be still compatible with “cat” or “scat 2” commands that continue to work as they do today.
quoted
- .llseek is set to default_llseek: this is wrong since for this type of files it does not make sense to reposition the ppos offset. Hence this should be set instead to noop_llseek.As I said, it is a kind of text file, default_llseek is better. But, if we change (re-design) what is this "enable" file is, we can accept these changes. So this is not a "Fix" but re-design of the "enable" file as an interface (as a char device), not a text file (or a block device). I want to keep this as is, same as other tracefs files.
IMO it is a redesign that is enforcing the user to avoid erroneous usages of enable files (because the reads are either reporting the whole and correct status of the event or failing to read; also the user cannot llseek into a position that would lead to not reading or reading a corrupted status). On the other hand the proposed re-design is fully compatible with the current user space commands reading and writing to the enable files. If the concern is having inconsistent implementations between tracefs files, I am happy to extend this patchset to all traces files, however, before doing so, I would like to have your approval. Otherwise I will just document the current functions and associated assumptions of use that the user must comply with in order to avoid the erroneous behaviour. Thanks a lot for your inputs and clarifications. Gab
Thank you,quoted
This patch fixes all the issues listed above. Signed-off-by: Gabriele Paoloni <redacted> Tested-by: Alessandro Carminati <redacted> --- kernel/trace/trace_events.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-)diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c index 120531268abf..5e84ef01d0c8 100644 --- a/kernel/trace/trace_events.c +++ b/kernel/trace/trace_events.c@@ -1798,6 +1798,13 @@ event_enable_read(struct file *filp, char __user *ubuf, size_t cnt, strcat(buf, "\n"); + /* + * A requested cnt less than strlen(buf) could lead to a wrong + * event status being reported. + */ + if (cnt < strlen(buf)) + return -EINVAL; + return simple_read_from_buffer(ubuf, cnt, ppos, buf, strlen(buf)); }@@ -1833,8 +1840,6 @@ event_enable_write(struct file *filp, const char __user *ubuf, size_t cnt, return -EINVAL; } - *ppos += cnt; - return cnt; }@@ -2557,7 +2562,7 @@ static const struct file_operations ftrace_enable_fops = { .read = event_enable_read, .write = event_enable_write, .release = tracing_release_file_tr, - .llseek = default_llseek, + .llseek = noop_llseek, }; static const struct file_operations ftrace_event_format_fops = { --2.48.1-- Masami Hiramatsu (Google) [off-list ref]