Re: [PATCH v2 1/3] tracing/user_events: Fix incorrect return value for... | linux-trace-kernel

Re: [PATCH v2 1/3] tracing/user_events: Fix incorrect return value for writing operation when events are disabled

From: Beau Belgrave <hidden>
Date: 2023-06-19 18:41:22
Also in: linux-kselftest, lkml

On Mon, Jun 19, 2023 at 04:51:56PM +0800, sunliming wrote:

Beau Belgrave [off-list ref] 于2023年6月17日周六 00:08写道：

quoted

On Fri, Jun 09, 2023 at 11:03:00AM +0800, sunliming wrote:

quoted

The writing operation return the count of writes whether events are
enabled or disabled. This is incorrect when events are disabled. Fix
this by just return -ENOENT when events are disabled.

When testing this patch locally I found that we would occasionally get
-ENOENT when events were enabled, but then become disabled, since writes
do not have any locking around the tracepoint checks for performance
reasons.

I've asked a few peers of mine their thoughts on this, whether an error
should result when there are no enabled events. The consensus I've heard
back is that they would not consider this case an actual error, just as
writing to /dev/null does not actually return an error.

However, if you feel strongly we need this and have a good use case, it
seems better to enable this logic behind a flag instead of having it
default based on my conversations with others.

Thanks,
-Beau



There is indeed a problem. Once enabled, perform the write operation
immediately.

The immediate write does work, and gets put into a buffer. The ftrace
and perf self tests do the above case. So, no worries at this point.

Now，when the event is disabled, the trace record appears to be lost.

I'm taking this to mean, if in between the time of the bit check and the
actual write() /writev() syscall the event becomes disabled, the event
won't write to the buffer. Yes, that is expected.

In some situations
where data timing is sensitive, it may cause confusion. In this case,
not returning an
error (as mentioned in your reply, it is not considered this case an
actual error) and
returning 0 ( meaning that the number of data to be written is 0) may
be a good way
to handle it?

This is where I get a little lost. What would a user process do with a
return of 0 bytes? It shouldn't retry, since it just hit that small
timing window. In reality, it just incurred a temporary excessive
syscall cost, but no real data loss (the operator/admin turned the event
off).

I'm missing why you feel it's important the user process know such a
window was hit?

Can you help me understand that?

I do think returning 0 bytes is better than an error here, but I'd
really like to know why the user process wants to know at all. Maybe
they have user-space only logging and want to be able to mark there if
it's in both spots (kernel and user buffers)?

Thanks,
-Beau

Thanks,
-Sunliming

quoted

Signed-off-by: sunliming <redacted>
---
 kernel/trace/trace_events_user.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/trace/trace_events_user.c b/kernel/trace/trace_events_user.c
index 1ac5ba5685ed..92204bbe79da 100644
--- a/kernel/trace/trace_events_user.c
+++ b/kernel/trace/trace_events_user.c

@@ -1957,7 +1957,8 @@ static ssize_t user_events_write_core(struct file *file, struct iov_iter *i)

              if (unlikely(faulted))
                      return -EFAULT;
-     }
+     } else
+             return -ENOENT;

      return ret;
 }
--

2.25.1

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help