Re: [PATCH v2 7/7] tools/testing/selftests/bpf: replace open-coded 16 with TASK_COMM_LEN
From: Yafang Shao <hidden>
Date: 2023-02-09 06:21:53
Also in:
linux-fsdevel, linux-mm, linux-perf-users, lkml, netdev
On Thu, Feb 9, 2023 at 10:07 AM Mathieu Desnoyers [off-list ref] wrote:
On 2023-02-08 19:54, John Stultz wrote:quoted
On Wed, Feb 8, 2023 at 4:11 PM Alexei Starovoitov [off-list ref] wrote:quoted
On Wed, Feb 8, 2023 at 2:01 PM John Stultz [off-list ref] wrote:quoted
On Sat, Nov 20, 2021 at 11:27:38AM +0000, Yafang Shao wrote:quoted
As the sched:sched_switch tracepoint args are derived from the kernel, we'd better make it same with the kernel. So the macro TASK_COMM_LEN is converted to type enum, then all the BPF programs can get it through BTF. The BPF program which wants to use TASK_COMM_LEN should include the header vmlinux.h. Regarding the test_stacktrace_map and test_tracepoint, as the type defined in linux/bpf.h are also defined in vmlinux.h, so we don't need to include linux/bpf.h again. Signed-off-by: Yafang Shao <redacted> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: David Hildenbrand <redacted> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Arnaldo Carvalho de Melo <redacted> Cc: Andrii Nakryiko <redacted> Cc: Michal Miroslaw <mirq-linux@rere.qmqm.pl> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: David Hildenbrand <redacted> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Kees Cook <redacted> Cc: Petr Mladek <pmladek@suse.com> --- include/linux/sched.h | 9 +++++++-- tools/testing/selftests/bpf/progs/test_stacktrace_map.c | 6 +++--- tools/testing/selftests/bpf/progs/test_tracepoint.c | 6 +++--- 3 files changed, 13 insertions(+), 8 deletions(-)Hey all, I know this is a little late, but I recently got a report that this change was causiing older versions of perfetto to stop working. Apparently newer versions of perfetto has worked around this via the following changes: https://android.googlesource.com/platform/external/perfetto/+/c717c93131b1b6e3705a11092a70ac47c78b731d%5E%21/ https://android.googlesource.com/platform/external/perfetto/+/160a504ad5c91a227e55f84d3e5d3fe22af7c2bb%5E%21/ But for older versions of perfetto, reverting upstream commit 3087c61ed2c4 ("tools/testing/selftests/bpf: replace open-coded 16 with TASK_COMM_LEN") is necessary to get it back to working. I haven't dug very far into the details, and obviously this doesn't break with the updated perfetto, but from a high level this does seem to be a breaking-userland regression. So I wanted to reach out to see if there was more context for this breakage? I don't want to raise a unnecessary stink if this was an unfortuante but forced situation.Let me understand what you're saying... The commit 3087c61ed2c4 did -/* Task command name length: */ -#define TASK_COMM_LEN 16 +/* + * Define the task command name length as enum, then it can be visible to + * BPF programs. + */ +enum { + TASK_COMM_LEN = 16, +}; and that caused: cat /sys/kernel/debug/tracing/events/task/task_newtask/format to print field:char comm[TASK_COMM_LEN]; offset:12; size:16; signed:0; instead of field:char comm[16]; offset:12; size:16; signed:0; so the ftrace parsing android tracing tool had to do: - if (Match(type_and_name.c_str(), R"(char [a-zA-Z_]+\[[0-9]+\])")) { + if (Match(type_and_name.c_str(), + R"(char [a-zA-Z_][a-zA-Z_0-9]*\[[a-zA-Z_0-9]+\])")) { to workaround this change. Right?I believe so.quoted
And what are you proposing?I'm not proposing anything. I was just wanting to understand more context around this, as it outwardly appears to be a user-breaking change, and that is usually not done, so I figured it was an issue worth raising. If the debug/tracing/*/format output is in the murky not-really-abi space, that's fine, but I wanted to know if this was understood as something that may require userland updates or if this was a unexpected side-effect.If you are looking at the root cause in the kernel code generating this: kernel/trace/trace_events.c:f_show() /* * Smartly shows the array type(except dynamic array). * Normal: * field:TYPE VAR * If TYPE := TYPE[LEN], it is shown: * field:TYPE VAR[LEN] */ where it uses the content of field->type (a string) to format the VAR[LEN] part. This in turn is the result of the definition of the struct trace_event_fields done in: include/trace/trace_events.h at stage 4, thus with the context of those macros defined: include/trace/stages/stage4_event_fields.h: #undef __array #define __array(_type, _item, _len) { \ .type = #_type"["__stringify(_len)"]", .name = #_item, \ .size = sizeof(_type[_len]), .align = ALIGN_STRUCTFIELD(_type), \ .is_signed = is_signed_type(_type), .filter_type = FILTER_OTHER }, I suspect the real culprit here is the use of __stringify(_len), which happens to work on macros, but not on enum labels. One possible solution to make this more robust would be to extend struct trace_event_fields with one more field that indicates the length of an array as an actual integer, without storing it in its stringified form in the type, and do the formatting in f_show where it belongs. This way everybody can stay happy and no ABI is broken. Thoughts ?
Many thanks for the detailed analysis. Seems it can work. Hi John, Could you pls. try the attached fix ? I have verified it in my test env. -- Regards Yafang
Attachments
- 0001-trace-fix-TASK_COMM_LEN-in-trace-event-format-file.patch [application/octet-stream] 5019 bytes · preview