Thread (16 messages) 16 messages, 6 authors, 2023-09-13

Re: [PATCH v3] scripts/link-vmlinux.sh: Add alias to duplicate symbols for kallsyms

From: Masahiro Yamada <masahiroy@kernel.org>
Date: 2023-09-06 15:07:03
Also in: linux-kbuild, live-patching, lkml

On Wed, Sep 6, 2023 at 7:09 PM Alessandro Carminati
[off-list ref] wrote:
Hello Masahiro,

Thank you for your suggestions,
Il giorno sab 2 set 2023 alle ore 08:36 Masahiro Yamada
[off-list ref] ha scritto:
quoted
On Mon, Aug 28, 2023 at 8:45 PM Alessandro Carminati (Red Hat)
[off-list ref] wrote:
quoted
From: Alessandro Carminati <redacted>

It is not uncommon for drivers or modules related to similar peripherals
to have symbols with the exact same name.
While this is not a problem for the kernel's binary itself, it becomes an
issue when attempting to trace or probe specific functions using
infrastructure like ftrace or kprobe.

The tracing subsystem relies on the `nm -n vmlinux` output, which provides
symbol information from the kernel's ELF binary. However, when multiple
symbols share the same name, the standard nm output does not differentiate
between them. This can lead to confusion and difficulty when trying to
probe the intended symbol.

 ~ # cat /proc/kallsyms | grep " name_show"
 ffffffff8c4f76d0 t name_show
 ffffffff8c9cccb0 t name_show
 ffffffff8cb0ac20 t name_show
 ffffffff8cc728c0 t name_show
 ffffffff8ce0efd0 t name_show
 ffffffff8ce126c0 t name_show
 ffffffff8ce1dd20 t name_show
 ffffffff8ce24e70 t name_show
 ffffffff8d1104c0 t name_show
 ffffffff8d1fe480 t name_show

**kas_alias** addresses this challenge by extending the symbol names with
unique suffixes during the kernel build process.
The newly created aliases for these duplicated symbols are unique names
that can be fed to the ftracefs interface. By doing so, it enables
previously unreachable symbols to be probed.

 ~ # cat /proc/kallsyms | grep " name_show"
 ffffffff974f76d0 t name_show
 ffffffff974f76d0 t name_show__alias__6340
 ffffffff979cccb0 t name_show
 ffffffff979cccb0 t name_show__alias__6341
 ffffffff97b0ac20 t name_show
 ffffffff97b0ac20 t name_show__alias__6342
 ffffffff97c728c0 t name_show
 ffffffff97c728c0 t name_show__alias__6343
 ffffffff97e0efd0 t name_show
 ffffffff97e0efd0 t name_show__alias__6344
 ffffffff97e126c0 t name_show
 ffffffff97e126c0 t name_show__alias__6345
 ffffffff97e1dd20 t name_show
 ffffffff97e1dd20 t name_show__alias__6346
 ffffffff97e24e70 t name_show
 ffffffff97e24e70 t name_show__alias__6347
 ffffffff981104c0 t name_show
 ffffffff981104c0 t name_show__alias__6348
 ffffffff981fe480 t name_show
 ffffffff981fe480 t name_show__alias__6349
 ~ # echo "p:kprobes/evnt1 name_show__alias__6349" \
 > >/sys/kernel/tracing/kprobe_events
 ~ # cat /sys/kernel/tracing/kprobe_events
 p:kprobes/evnt1 name_show__alias__6349

Changes from v1:
- Integrated changes requested by Masami to exclude symbols with prefixes
  "_cfi" and "_pfx".
- Introduced a small framework to handle patterns that need to be excluded
  from the alias production.
- Excluded other symbols using the framework.
- Introduced the ability to discriminate between text and data symbols.
- Added two new config symbols in this version: CONFIG_KALLSYMS_ALIAS_DATA,
  which allows data for data, and CONFIG_KALLSYMS_ALIAS_DATA_ALL, which
  excludes all filters and provides an alias for each duplicated symbol.

https://lore.kernel.org/all/20230711151925.1092080-1-alessandro.carminati@gmail.com/ (local)

Changes from v2:
- Alias tags are created by querying DWARF information from the vmlinux.
- The filename + line number is normalized and appended to the original name.
- The tag begins with '@' to indicate the symbol source.
- Not a change, but worth mentioning, since the alias is added to the existing
  list, the old duplicated name is preserved, and the livepatch way of dealing
  with duplicates is maintained.
- Acknowledging the existence of scenarios where inlined functions declared in
  header files may result in multiple copies due to compiler behavior, though
   it is not actionable as it does not pose an operational issue.
- Highlighting a single exception where the same name refers to different
  functions: the case of "compat_binfmt_elf.c," which directly includes
  "binfmt_elf.c" producing identical function copies in two separate
  modules.

sample from new v3

 ~ # cat /proc/kallsyms | grep gic_mask_irq
 ffffd0b03c04dae4 t gic_mask_irq
 ffffd0b03c04dae4 t gic_mask_irq@_drivers_irqchip_irq-gic_c_167
 ffffd0b03c050960 t gic_mask_irq
 ffffd0b03c050960 t gic_mask_irq@_drivers_irqchip_irq-gic-v3_c_404
 ~ #

https://lore.kernel.org/all/20230714150326.1152359-1-alessandro.carminati@gmail.com/ (local)

Signed-off-by: Alessandro Carminati (Red Hat) <redacted>
---
 init/Kconfig                        |  36 ++++
 scripts/Makefile                    |   4 +
 scripts/kas_alias/Makefile          |   4 +
 scripts/kas_alias/a2l.c             | 268 ++++++++++++++++++++++++++++
 scripts/kas_alias/a2l.h             |  32 ++++
 scripts/kas_alias/duplicates_list.c |  70 ++++++++
 scripts/kas_alias/duplicates_list.h |  15 ++
 scripts/kas_alias/item_list.c       | 230 ++++++++++++++++++++++++
 scripts/kas_alias/item_list.h       |  26 +++
 scripts/kas_alias/kas_alias.c       | 217 ++++++++++++++++++++++
 scripts/link-vmlinux.sh             |  11 +-
 11 files changed, 910 insertions(+), 3 deletions(-)

I added some review comments in another thread, but
one of the biggest concerns might be "910 insertions".


What this program does is quite simple,
"find duplicated names, and call addr2line".



You wrote a lot of code to self-implement these:

 - sort function
 - parse PATH env variable to find addr2line
 - fork addr2line to establish pipe communications



Have you considered writing the code in Python (or Perl)?
Is it too slow?
I have attempted to incorporate all your suggestions.
I refactored the C code to utilize hashing instead of sorting, and I
completely re-implemented the entire thing in Python for the purpose of
comparison.

You are correct;
the C version is indeed faster, but the difference is negligible when
considering the use case and code maintainability.

Nice. Then, I prefer shorter code.

The Python implementation is 0.2 sec slower
(given the script is executed three times, 0.6 sec cost in total)
but it is not a big issue, I think.

Thanks.








Here's a direct comparison of the two.
~ $ time ./kas_alias.py -a /usr/bin/aarch64-linux-gnu-addr2line \
                      -n linux-6.5/.tmp_vmlinux.kallsyms1.syms \
                      -v linux-6.5/.tmp_vmlinux.kallsyms1 \
                      -o output_py

real    0m1.626s
user    0m1.436s
sys     0m0.185s
$ cat kas_alias.py | wc -l
133
~ $ time ./kas_alias -a /usr/bin/aarch64-linux-gnu-addr2line \
                   -v linux-6.5/.tmp_vmlinux.kallsyms1 \
                   -n linux-6.5/.tmp_vmlinux.kallsyms1.syms \
                   -o output_c

real    0m1.418s
user    0m1.262s
sys     0m0.162s
~ $ cat a2l.c a2l.h conf.c conf.h item_list.c item_list.h kas_alias.c | wc -l
742
~ $ diff output_py output_c
~ $
C version is 7/10% faster but is more than 5 times in terms of code size.
quoted
Most of the functions you implemented are already
available in script languages.



I am not sure if "@<file-path>" is a good solution,
but the amount of the added code looks too much to me.
I followed Francis's suggestion and made the separator between
<symbol name> and <normalized filename> an argument that you can select
using the command line. Since I'm not aware of a better choice, I set the
default value to '@'.
quoted



--
Best Regards
Masahiro Yamada
Best regards
Alessandro Carminati


-- 
Best Regards
Masahiro Yamada
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help