Re: [PATCH bpf] tools/runqslower: fix handle__sched_switch for updated tp sched_switch
From: Song Liu <hidden>
Date: 2022-03-30 06:43:31
Also in:
bpf
On Mar 29, 2022, at 7:47 PM, Andrii Nakryiko [off-list ref] wrote: On Tue, Mar 29, 2022 at 5:39 PM Song Liu [off-list ref] wrote:quoted
quoted
On Mar 29, 2022, at 5:00 PM, Andrii Nakryiko [off-list ref] wrote: On Tue, Mar 29, 2022 at 4:19 PM Song Liu [off-list ref] wrote:quoted
TP_PROTO of sched_switch is updated with a new arg prev_state, which causes runqslower load failure: libbpf: prog 'handle__sched_switch': BPF program load failed: Permission denied libbpf: prog 'handle__sched_switch': -- BEGIN PROG LOAD LOG -- R1 type=ctx expected=fp 0: R1=ctx(off=0,imm=0) R10=fp0 ; int handle__sched_switch(u64 *ctx) 0: (bf) r7 = r1 ; R1=ctx(off=0,imm=0) R7_w=ctx(off=0,imm=0) ; struct task_struct *next = (struct task_struct *)ctx[2]; 1: (79) r6 = *(u64 *)(r7 +16) func 'sched_switch' arg2 has btf_id 186 type STRUCT 'task_struct' 2: R6_w=ptr_task_struct(off=0,imm=0) R7_w=ctx(off=0,imm=0) ; struct task_struct *prev = (struct task_struct *)ctx[1]; 2: (79) r2 = *(u64 *)(r7 +8) ; R2_w=scalar() R7_w=ctx(off=0,imm=0) 3: (b7) r1 = 0 ; R1_w=0 ; struct runq_event event = {}; 4: (7b) *(u64 *)(r10 -8) = r1 ; R1_w=P0 R10=fp0 fp-8_w=00000000 5: (7b) *(u64 *)(r10 -16) = r1 ; R1_w=P0 R10=fp0 fp-16_w=00000000 6: (7b) *(u64 *)(r10 -24) = r1 ; R1_w=P0 R10=fp0 fp-24_w=00000000 7: (7b) *(u64 *)(r10 -32) = r1 ; R1_w=P0 R10=fp0 fp-32_w=00000000 ; if (prev->__state == TASK_RUNNING) 8: (61) r1 = *(u32 *)(r2 +24) R2 invalid mem access 'scalar' processed 9 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0 -- END PROG LOAD LOG -- libbpf: failed to load program 'handle__sched_switch' libbpf: failed to load object 'runqslower_bpf' libbpf: failed to load BPF skeleton 'runqslower_bpf': -13 failed to load BPF object: -13 Update runqslower to fix this issue. Also, as we are on this, use BPF_PROG in runqslower for cleaner code. Fixes: fa2c3254d7cf ("sched/tracing: Don't re-read p->state when emitting sched_switch event") Signed-off-by: Song Liu <song@kernel.org> --- tools/bpf/runqslower/runqslower.bpf.c | 19 +++++-------------- 1 file changed, 5 insertions(+), 14 deletions(-)It would be much less disruptive if that prev_state was added after "next", but oh well...Maybe we should change that. +Valentin and Steven, how about we change the order with the attached diff (not the original patch in this thread, but the one at the end of this email)?quoted
But anyways, let's handle this in a way that can handle both old kernels and new ones and do the same change in libbpf-tool's runqslower ([0]). Can you please follow up there as well?Yeah, I will also fix that one.Thanks!quoted
quoted
We can use BPF CO-RE to detect which order of arguments running kernel has by checking prev_state field existence in struct trace_event_raw_sched_switch. Can you please try that? Use bpf_core_field_exists() for that.Do you mean something like if (bpf_core_field_exists(ctx->prev_state)) /* use ctx[2] and ctx[3] */ else /* use ctx[1] and ctx[2] */yep, that's what I meant, except you don't have ctx->prev_state, you have to do: if (bpf_core_field_exists(((struct trace_event_raw_sched_switch *)0)->prev_state))quoted
? I think we will need BTF for the arguments, which doesn't exist yet. Did I miss something?Probably :) struct trace_event_raw_sched_switch is in vmlinux.h already for non-raw sched:sched_switch tracepoint. We just use that type information for feature probing. From BPF verifier's perspective, ctx[1] or ctx[2] will have proper BTF information (task_struct) during program validation.
Sigh. I run pahole and saw trace_event_raw_sched_switch. Somehow I thought it was not the one. Will send v2 tomorrow. Thanks, Song