Re: [RFC PATCH] tracing: change syscall number type in struct syscall_trace_*

From: Artem Savkov <hidden>
Date: 2023-10-04 12:56:43
Also in: bpf, linux-rt-users, lkml, netdev

On Tue, Oct 03, 2023 at 09:38:44PM -0400, Steven Rostedt wrote:

On Mon,  2 Oct 2023 15:52:42 +0200
Artem Savkov [off-list ref] wrote:

quoted

linux-rt-devel tree contains a patch that adds an extra member to struct
trace_entry. This causes the offset of args field in struct
trace_event_raw_sys_enter be different from the one in struct
syscall_trace_enter:

This patch looks like it's fixing the symptom and not the issue. No code
should rely on the two event structures to be related. That's an unwanted
coupling, that will likely cause issues down the road (like the RT patch
you mentioned).

I agree, but I didn't see a better solution and that was my way of
starting conversation, thus the RFC.

quoted

struct trace_event_raw_sys_enter {
        struct trace_entry         ent;                  /*     0    12 */

        /* XXX last struct has 3 bytes of padding */
        /* XXX 4 bytes hole, try to pack */

        long int                   id;                   /*    16     8 */
        long unsigned int          args[6];              /*    24    48 */
        /* --- cacheline 1 boundary (64 bytes) was 8 bytes ago --- */
        char                       __data[];             /*    72     0 */

        /* size: 72, cachelines: 2, members: 4 */
        /* sum members: 68, holes: 1, sum holes: 4 */
        /* paddings: 1, sum paddings: 3 */
        /* last cacheline: 8 bytes */
};

struct syscall_trace_enter {
        struct trace_entry         ent;                  /*     0    12 */

        /* XXX last struct has 3 bytes of padding */

        int                        nr;                   /*    12     4 */
        long unsigned int          args[];               /*    16     0 */

        /* size: 16, cachelines: 1, members: 3 */
        /* paddings: 1, sum paddings: 3 */
        /* last cacheline: 16 bytes */
};

This, in turn, causes perf_event_set_bpf_prog() fail while running bpf
test_profiler testcase because max_ctx_offset is calculated based on the
former struct, while off on the latter:

The above appears to be pointing to the real bug. The "is calculated based
on the former struct while off on the latter" Why are the two being used
together? They are supposed to be *unrelated*!

quoted

  10488         if (is_tracepoint || is_syscall_tp) {
  10489                 int off = trace_event_get_offsets(event->tp_event);

So basically this is clumping together the raw_syscalls with the syscalls
events as if they are the same. But the are not. They are created
differently. It's basically like using one structure to get the offsets of
another structure. That would be a bug anyplace else in the kernel. Sounds
like it's a bug here too.

I think the issue is with this code, not the tracing code.

We could expose the struct syscall_trace_enter and syscall_trace_exit if
the offsets to those are needed.

I don't think we need syscall_trace_* offsets, looks like
trace_event_get_offsets() should return offset trace_event_raw_sys_enter
instead. I am still trying to figure out how all of this works together.
Maybe Alexei or Andrii have more context here.

-- 
 Artem

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help