Re: [PATCH -trace] trace: fix TASK_COMM_LEN in trace event format file
From: Alexei Starovoitov <hidden>
Date: 2023-02-10 22:32:33
Also in:
stable
On Fri, Feb 10, 2023 at 10:13 AM Kajetan Puchalski [off-list ref] wrote:
Hi Yafang,quoted
After commit 3087c61ed2c4 ("tools/testing/selftests/bpf: replace open-coded 16 with TASK_COMM_LEN"), the content of the format file under /sys/kernel/debug/tracing/events/task/task_newtask was changed from field:char comm[16]; offset:12; size:16; signed:0; to field:char comm[TASK_COMM_LEN]; offset:12; size:16; signed:0; John reported that this change breaks older versions of perfetto. Then Mathieu pointed out that this behavioral change was caused by the use of __stringify(_len), which happens to work on macros, but not on enum labels. And he also gave the suggestion on how to fix it: :One possible solution to make this more robust would be to extend :struct trace_event_fields with one more field that indicates the length :of an array as an actual integer, without storing it in its stringified :form in the type, and do the formatting in f_show where it belongs. The result as follows after this change, $ cat /sys/kernel/debug/tracing/events/task/task_newtask/format field:char comm[16]; offset:12; size:16; signed:0; Fixes: 3087c61ed2c4 ("tools/testing/selftests/bpf: replace open-coded 16 with TASK_COMM_LEN") Reported-by: John Stultz <jstultz@google.com> Debugged-by: Mathieu Desnoyers [off-list ref] Suggested-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Yafang Shao <redacted> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Alexei Starovoitov <redacted> Cc: Kajetan Puchalski <redacted> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: John Stultz <jstultz@google.com> Cc: stable@vger.kernel.org # v5.17+From my testing this works the same with older Perfetto as the previous diff you sent in the older thread, ie this type of errors: Name Value Type mismatched_sched_switch_tids 2439 error (analysis) systrace_parse_failure 3853 error (analysis) Meaning that applying this patch on top of the old one ends up with different results than just reverting the old one so not a 100% fix but as I said before, still an improvement.
fwiw I don't mind reverting commit 3087c61ed2c4.
For the record it didn't go through the bpf tree.
It went through mm.
But first we need to define which part of ftrace format is an abi.
I think it's a format of tracing/event/foo/format file
and not the contents of it.
The tracepoints come and go. Their arguments get changed too.
So the contents of ftrace format files change.
In this case Perfetto stumbled on parsing
field:char comm[TASK_COMM_LEN]; offset:8;
We probably should define that 'field: value offset: value'
is an abi, but value-s can change.
Say offset:8 becomes offset:+8, for whatever reason.
If Perfetto fails to parse it, it's a Perfetto bug.
In this case it failed to parse char comm[TASK_COMM_LEN];
But that's not the only such "field:".
These three were there for a long time.
field:u32 rates[NUM_NL80211_BANDS]; offset:16; size:24;
field:int mcast_rate[NUM_NL80211_BANDS]; offset:60; size:24;
field:int mcast_rate[NUM_NL80211_BANDS]; offset:108; size:24;
I suspect Perfetto didn't have a use case to parse them,
so the bug stayed dormant and a change in TASK_COMM_LEN triggered it.
We can use Yafang's patch to do:
-field:%.*s %s%s;
+field:%.*s %s[%d];
but it will affect both TASK_COMM_LEN and NUM_NL80211_BANDS.
In summary the TASK_COMM_LEN change from #define to enum didn't
introduce anything new in the kind of "value"s being printed
in ftrace files. Hence I'm arguing there is no abi breakage.
There was a question why change from #define to enum is useful
to bpf. The reason is that #define-s are not seen in dwarf
whereas enums and their values are. bpf tooling has ways to extract
that data and auto-adjust bpf programs when enum-s disappear
or their values change.