Re: [PATCH v3] scripts/link-vmlinux.sh: Add alias to duplicate symbols for kallsyms
From: Alessandro Carminati <hidden>
Date: 2023-09-02 07:41:29
Also in:
linux-kbuild, live-patching, lkml
Il giorno sab 2 set 2023 alle ore 08:36 Masahiro Yamada [off-list ref] ha scritto:
On Mon, Aug 28, 2023 at 8:45 PM Alessandro Carminati (Red Hat) [off-list ref] wrote:quoted
From: Alessandro Carminati <redacted> It is not uncommon for drivers or modules related to similar peripherals to have symbols with the exact same name. While this is not a problem for the kernel's binary itself, it becomes an issue when attempting to trace or probe specific functions using infrastructure like ftrace or kprobe. The tracing subsystem relies on the `nm -n vmlinux` output, which provides symbol information from the kernel's ELF binary. However, when multiple symbols share the same name, the standard nm output does not differentiate between them. This can lead to confusion and difficulty when trying to probe the intended symbol. ~ # cat /proc/kallsyms | grep " name_show" ffffffff8c4f76d0 t name_show ffffffff8c9cccb0 t name_show ffffffff8cb0ac20 t name_show ffffffff8cc728c0 t name_show ffffffff8ce0efd0 t name_show ffffffff8ce126c0 t name_show ffffffff8ce1dd20 t name_show ffffffff8ce24e70 t name_show ffffffff8d1104c0 t name_show ffffffff8d1fe480 t name_show **kas_alias** addresses this challenge by extending the symbol names with unique suffixes during the kernel build process. The newly created aliases for these duplicated symbols are unique names that can be fed to the ftracefs interface. By doing so, it enables previously unreachable symbols to be probed. ~ # cat /proc/kallsyms | grep " name_show" ffffffff974f76d0 t name_show ffffffff974f76d0 t name_show__alias__6340 ffffffff979cccb0 t name_show ffffffff979cccb0 t name_show__alias__6341 ffffffff97b0ac20 t name_show ffffffff97b0ac20 t name_show__alias__6342 ffffffff97c728c0 t name_show ffffffff97c728c0 t name_show__alias__6343 ffffffff97e0efd0 t name_show ffffffff97e0efd0 t name_show__alias__6344 ffffffff97e126c0 t name_show ffffffff97e126c0 t name_show__alias__6345 ffffffff97e1dd20 t name_show ffffffff97e1dd20 t name_show__alias__6346 ffffffff97e24e70 t name_show ffffffff97e24e70 t name_show__alias__6347 ffffffff981104c0 t name_show ffffffff981104c0 t name_show__alias__6348 ffffffff981fe480 t name_show ffffffff981fe480 t name_show__alias__6349 ~ # echo "p:kprobes/evnt1 name_show__alias__6349" \ > >/sys/kernel/tracing/kprobe_events ~ # cat /sys/kernel/tracing/kprobe_events p:kprobes/evnt1 name_show__alias__6349 Changes from v1: - Integrated changes requested by Masami to exclude symbols with prefixes "_cfi" and "_pfx". - Introduced a small framework to handle patterns that need to be excluded from the alias production. - Excluded other symbols using the framework. - Introduced the ability to discriminate between text and data symbols. - Added two new config symbols in this version: CONFIG_KALLSYMS_ALIAS_DATA, which allows data for data, and CONFIG_KALLSYMS_ALIAS_DATA_ALL, which excludes all filters and provides an alias for each duplicated symbol. https://lore.kernel.org/all/20230711151925.1092080-1-alessandro.carminati@gmail.com/ (local) Changes from v2: - Alias tags are created by querying DWARF information from the vmlinux. - The filename + line number is normalized and appended to the original name. - The tag begins with '@' to indicate the symbol source. - Not a change, but worth mentioning, since the alias is added to the existing list, the old duplicated name is preserved, and the livepatch way of dealing with duplicates is maintained. - Acknowledging the existence of scenarios where inlined functions declared in header files may result in multiple copies due to compiler behavior, though it is not actionable as it does not pose an operational issue. - Highlighting a single exception where the same name refers to different functions: the case of "compat_binfmt_elf.c," which directly includes "binfmt_elf.c" producing identical function copies in two separate modules. sample from new v3 ~ # cat /proc/kallsyms | grep gic_mask_irq ffffd0b03c04dae4 t gic_mask_irq ffffd0b03c04dae4 t gic_mask_irq@_drivers_irqchip_irq-gic_c_167 ffffd0b03c050960 t gic_mask_irq ffffd0b03c050960 t gic_mask_irq@_drivers_irqchip_irq-gic-v3_c_404 ~ # https://lore.kernel.org/all/20230714150326.1152359-1-alessandro.carminati@gmail.com/ (local) Signed-off-by: Alessandro Carminati (Red Hat) <redacted> --- init/Kconfig | 36 ++++ scripts/Makefile | 4 + scripts/kas_alias/Makefile | 4 + scripts/kas_alias/a2l.c | 268 ++++++++++++++++++++++++++++ scripts/kas_alias/a2l.h | 32 ++++ scripts/kas_alias/duplicates_list.c | 70 ++++++++ scripts/kas_alias/duplicates_list.h | 15 ++ scripts/kas_alias/item_list.c | 230 ++++++++++++++++++++++++ scripts/kas_alias/item_list.h | 26 +++ scripts/kas_alias/kas_alias.c | 217 ++++++++++++++++++++++ scripts/link-vmlinux.sh | 11 +- 11 files changed, 910 insertions(+), 3 deletions(-)I added some review comments in another thread, but one of the biggest concerns might be "910 insertions".
Based on the feedback I received in the reviews, I need to overhaul the code, potentially reducing its size. What would be a reasonable number of lines for this feature?
What this program does is quite simple, "find duplicated names, and call addr2line". You wrote a lot of code to self-implement these: - sort function - parse PATH env variable to find addr2line - fork addr2line to establish pipe communications
Some of these functions might become obsolete in the upcoming v4, which will certainly reduce the line count.
Have you considered writing the code in Python (or Perl)? Is it too slow?
From my perspective, there is a concern that using Python or Perl might result in slower performance. My proficiency in Python and Perl is limited, so I did not initially consider them as viable options for implementing this solution.
Most of the functions you implemented are already available in script languages. I am not sure if "@<file-path>" is a good solution, but the amount of the added code looks too much to me.
I had reservations about using the '@' symbol to decorate the alias because it's not a character commonly found in the kallsyms output. However, after careful consideration, I arrived at the conclusion that it was suitable for the task because it would make the alias stand-out and be easily identifiable. I'm open to any suggestions or alternative approaches you may have on this matter.
-- Best Regards Masahiro Yamada
Thank you