Thread (13 messages) 13 messages, 3 authors, 2025-07-02

Re: [RFC PATCH 1/2] tracing: fixes of ftrace_enable_fops

From: Gabriele Paoloni <hidden>
Date: 2025-06-19 17:07:49
Also in: lkml

Hi Masami

On Fri, Jun 13, 2025 at 4:45 AM Masami Hiramatsu [off-list ref] wrote:
On Thu, 12 Jun 2025 12:43:48 +0200
Gabriele Paoloni [off-list ref] wrote:
quoted
Currently there are different issues associated with ftrace_enable_fops
- event_enable_write: *ppos is increased while not used at all in the
  write operation itself (following a write, this could lead a read to
  fail or report a corrupted event status);
Here, we expected the "enable" file is a pseudo text file. So if
there is a write, the ppos should be incremented.
quoted
- event_enable_read: cnt < strlen(buf) is allowed and this can lead to
  reading an incomplete event status (i.e. not all status characters
  are retrieved) and/or reading the status in a non-atomic way (i.e.
  the status could change between two consecutive reads);
As I said, the "enable" file is a kind of text file. So reader must read
it until EOF. If you need to get the consistent result, user should
use the enough size of buffer.
What I am concerned about are scenarios like the one below:
---
# strace -Tfe trace=openat,open,read,write scat 1
/sys/kernel/tracing/events/kprobes/ev1/enable
open("/sys/kernel/tracing/events/kprobes/ev1/enable",
O_RDONLY|O_LARGEFILE) = 3 <0.000237>
Open /sys/kernel/tracing/events/kprobes/ev1/enable ->fd=3
read fd=3, 1
read(3, "0", 1)                         = 1 <0.000099>
1 bytes Read
30,
read(3, "\n", 1)                        = 1 <0.000095>
1 bytes Read
0a,
read(3, "", 1)                          = 0 <0.000102>
close fd=3
+++ exited with 0 +++
---
So in this case there are 2 consecutive reads byte by byte that
could lead to inconsistent results if in the meantime the event
status has changed.
With the proposed patchset the same test would result in a failure
as per log below:
---
# strace -Tfe trace=openat,open,read,write scat 1
/sys/kernel/tracing/events/kprobes/ev1/enable
open("/sys/kernel/tracing/events/kprobes/ev1/enable",
O_RDONLY|O_LARGEFILE) = 3 <0.000227>
Open /sys/kernel/tracing/events/kprobes/ev1/enable ->fd=3
read fd=3, 1
read(3, 0x7ffd960234e0, 1)              = -1 EINVAL (Invalid argument)
<0.000228>
close fd=3
+++ exited with 0 +++
---
On the other side the proposed patchset would be still compatible with
“cat” or “scat 2” commands that continue to work as they do today.
quoted
- .llseek is set to default_llseek: this is wrong since for this
  type of files it does not make sense to reposition the ppos offset.
  Hence this should be set instead to noop_llseek.
As I said, it is a kind of text file, default_llseek is better.

But, if we change (re-design) what is this "enable" file is,
we can accept these changes. So this is not a "Fix" but re-design
of the "enable" file as an interface (as a char device), not a text
file (or a block device).

I want to keep this as is, same as other tracefs files.
IMO it is a redesign that is enforcing the user to avoid erroneous
usages of enable files (because the reads are either reporting the
whole and correct status of the event or failing to read; also the user
cannot llseek into a position that would lead to not reading or reading
a corrupted status).

On the other hand the proposed re-design is fully compatible with
the current user space commands reading and writing to the enable
files.

If the concern is having inconsistent implementations between tracefs
files, I am happy to extend this patchset to all traces files, however,
before doing so, I would like to have your approval.
Otherwise I will just document the current functions and associated
assumptions of use that the user must comply with in order to avoid
the erroneous behaviour.

Thanks a lot for your inputs and clarifications.
Gab
Thank you,
quoted
This patch fixes all the issues listed above.

Signed-off-by: Gabriele Paoloni <redacted>
Tested-by: Alessandro Carminati <redacted>
---
 kernel/trace/trace_events.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 120531268abf..5e84ef01d0c8 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -1798,6 +1798,13 @@ event_enable_read(struct file *filp, char __user *ubuf, size_t cnt,

      strcat(buf, "\n");

+     /*
+      * A requested cnt less than strlen(buf) could lead to a wrong
+      * event status being reported.
+      */
+     if (cnt < strlen(buf))
+             return -EINVAL;
+
      return simple_read_from_buffer(ubuf, cnt, ppos, buf, strlen(buf));
 }
@@ -1833,8 +1840,6 @@ event_enable_write(struct file *filp, const char __user *ubuf, size_t cnt,
              return -EINVAL;
      }

-     *ppos += cnt;
-
      return cnt;
 }
@@ -2557,7 +2562,7 @@ static const struct file_operations ftrace_enable_fops = {
      .read = event_enable_read,
      .write = event_enable_write,
      .release = tracing_release_file_tr,
-     .llseek = default_llseek,
+     .llseek = noop_llseek,
 };

 static const struct file_operations ftrace_event_format_fops = {
--
2.48.1

--
Masami Hiramatsu (Google) [off-list ref]
  
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help