Thread (17 messages) 17 messages, 4 authors, 2024-08-24

Re: [PATCH v7 2/4] kbuild: generate offset range data for builtin modules

From: Masahiro Yamada <masahiroy@kernel.org>
Date: 2024-08-22 17:34:42
Also in: linux-kbuild, linux-modules, lkml

On Wed, Aug 21, 2024 at 1:11 PM Kris Van Hees [off-list ref] wrote:
quoted hunk ↗ jump to hunk
Create file module.builtin.ranges that can be used to find where
built-in modules are located by their addresses. This will be useful for
tracing tools to find what functions are for various built-in modules.

The offset range data for builtin modules is generated using:
 - modules.builtin: associates object files with module names
 - vmlinux.map: provides load order of sections and offset of first member
    per section
 - vmlinux.o.map: provides offset of object file content per section
 - .*.cmd: build cmd file with KBUILD_MODFILE

The generated data will look like:

.text 00000000-00000000 = _text
.text 0000baf0-0000cb10 amd_uncore
.text 0009bd10-0009c8e0 iosf_mbi
...
.text 00b9f080-00ba011a intel_skl_int3472_discrete
.text 00ba0120-00ba03c0 intel_skl_int3472_discrete intel_skl_int3472_tps68470
.text 00ba03c0-00ba08d6 intel_skl_int3472_tps68470
...
.data 00000000-00000000 = _sdata
.data 0000f020-0000f680 amd_uncore

For each ELF section, it lists the offset of the first symbol.  This can
be used to determine the base address of the section at runtime.

Next, it lists (in strict ascending order) offset ranges in that section
that cover the symbols of one or more builtin modules.  Multiple ranges
can apply to a single module, and ranges can be shared between modules.

The CONFIG_BUILTIN_MODULE_RANGES option controls whether offset range data
is generated for kernel modules that are built into the kernel image.

How it works:

  1. The modules.builtin file is parsed to obtain a list of built-in
     module names and their associated object names (the .ko file that
     the module would be in if it were a loadable module, hereafter
     referred to as <kmodfile>).  This object name can be used to
     identify objects in the kernel compile because any C or assembler
     code that ends up into a built-in module will have the option
     -DKBUILD_MODFILE=<kmodfile> present in its build command, and those
     can be found in the .<obj>.cmd file in the kernel build tree.

     If an object is part of multiple modules, they will all be listed
     in the KBUILD_MODFILE option argument.

     This allows us to conclusively determine whether an object in the
     kernel build belong to any modules, and which.

 2. The vmlinux.map is parsed next to determine the base address of each
    top level section so that all addresses into the section can be
    turned into offsets.  This makes it possible to handle sections
    getting loaded at different addresses at system boot.

    We also determine an 'anchor' symbol at the beginning of each
    section to make it possible to calculate the true base address of
    a section at runtime (i.e. symbol address - symbol offset).

    We collect start addresses of sections that are included in the top
    level section.  This is used when vmlinux is linked using vmlinux.o,
    because in that case, we need to look at the vmlinux.o linker map to
    know what object a symbol is found in.

    And finally, we process each symbol that is listed in vmlinux.map
    (or vmlinux.o.map) based on the following structure:

    vmlinux linked from vmlinux.a:

      vmlinux.map:
        <top level section>
          <included section>  -- might be same as top level section)
            <object>          -- built-in association known
              <symbol>        -- belongs to module(s) object belongs to
              ...

    vmlinux linked from vmlinux.o:

      vmlinux.map:
        <top level section>
          <included section>  -- might be same as top level section)
            vmlinux.o         -- need to use vmlinux.o.map
              <symbol>        -- ignored
              ...

      vmlinux.o.map:
        <section>
            <object>          -- built-in association known
              <symbol>        -- belongs to module(s) object belongs to
              ...

 3. As sections, objects, and symbols are processed, offset ranges are
    constructed in a striaght-forward way:

      - If the symbol belongs to one or more built-in modules:
          - If we were working on the same module(s), extend the range
            to include this object
          - If we were working on another module(s), close that range,
            and start the new one
      - If the symbol does not belong to any built-in modules:
          - If we were working on a module(s) range, close that range

Signed-off-by: Kris Van Hees <redacted>
Reviewed-by: Nick Alcock <redacted>
Reviewed-by: Alan Maguire <redacted>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
    Changes since v6:
     - Applied Masahiro Yamada's suggestions (Kconfig, makefile, script).

    Changes since v5:
     - Removed unnecessary compatibility info from option description.

    Changes since v4:
     - Improved commit description to explain the why and how.
     - Documented dependency on GNU AWK for CONFIG_BUILTIN_MODULE_RANGES.
     - Improved comments in generate_builtin_ranges.awk
     - Improved logic in generate_builtin_ranges.awk to handle incorrect
       object size information in linker maps

    Changes since v3:
     - Consolidated patches 2 through 5 into a single patch
     - Move CONFIG_BUILTIN_MODULE_RANGES to Kconfig.debug
     - Make CONFIG_BUILTIN_MODULE_RANGES select CONFIG_VMLINUX_MAP
     - Disable CONFIG_BUILTIN_MODULE_RANGES if CONFIG_LTO_CLANG_(FULL|THIN)=y
     - Support LLVM (lld) compiles in generate_builtin_ranges.awk
     - Support CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=y

    Changes since v2:
     - Add explicit dependency on FTRACE for CONFIG_BUILTIN_MODULE_RANGES
     - 1st arg to generate_builtin_ranges.awk is now modules.builtin.modinfo
     - Switched from using modules.builtin.objs to parsing .*.cmd files
     - Parse data from .*.cmd in generate_builtin_ranges.awk
     - Use $(real-prereqs) rather than $(filter-out ...)
---
 Documentation/process/changes.rst   |   7 +
 lib/Kconfig.debug                   |  16 +
 scripts/Makefile.vmlinux            |  18 +
 scripts/Makefile.vmlinux_o          |   3 +
 scripts/generate_builtin_ranges.awk | 506 ++++++++++++++++++++++++++++
 5 files changed, 550 insertions(+)
 create mode 100755 scripts/generate_builtin_ranges.awk
diff --git a/Documentation/process/changes.rst b/Documentation/process/changes.rst
index 3fc63f27c226..00f1ed7c59c3 100644
--- a/Documentation/process/changes.rst
+++ b/Documentation/process/changes.rst
@@ -64,6 +64,7 @@ GNU tar                1.28             tar --version
 gtags (optional)       6.6.5            gtags --version
 mkimage (optional)     2017.01          mkimage --version
 Python (optional)      3.5.x            python3 --version
+GNU AWK (optional)     5.1.0            gawk --version
 ====================== ===============  ========================================

 .. [#f1] Sphinx is needed only to build the Kernel documentation
@@ -192,6 +193,12 @@ platforms. The tool is available via the ``u-boot-tools`` package or can be
 built from the U-Boot source code. See the instructions at
 https://docs.u-boot.org/en/latest/build/tools.html#building-tools-for-linux

+GNU AWK
+-------
+
+GNU AWK is needed if you want kernel builds to generate address range data for
+builtin modules (CONFIG_BUILTIN_MODULE_RANGES).
+
 System utilities
 ****************
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index a30c03a66172..f087dc3da321 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -571,6 +571,22 @@ config VMLINUX_MAP
          pieces of code get eliminated with
          CONFIG_LD_DEAD_CODE_DATA_ELIMINATION.

+config BUILTIN_MODULE_RANGES
+       bool "Generate address range information for builtin modules"
+       depends on !LTO_CLANG_FULL
+       depends on !LTO_CLANG_THIN

Forgot to mention this.

These two lines can be replaced with

         depends on !LTO





quoted hunk ↗ jump to hunk
diff --git a/scripts/generate_builtin_ranges.awk b/scripts/generate_builtin_ranges.awk
new file mode 100755
index 000000000000..865cb7ac4970
--- /dev/null
+++ b/scripts/generate_builtin_ranges.awk
@@ -0,0 +1,506 @@
+#!/usr/bin/gawk -f
+# SPDX-License-Identifier: GPL-2.0
+# generate_builtin_ranges.awk: Generate address range data for builtin modules
+# Written by Kris Van Hees <kris.van.hees@oracle.com>
+#
+# Usage: generate_builtin_ranges.awk modules.builtin vmlinux.map \
+#              vmlinux.o.map > modules.builtin.ranges
+#
+
+# Return the module name(s) (if any) associated with the given object.
+#
+# If we have seen this object before, return information from the cache.
+# Otherwise, retrieve it from the corresponding .cmd file.
+#
+function get_module_info(fn, mod, obj, s) {
+       if (fn in omod)
+               return omod[fn];
+
+       if (match(fn, /\/[^/]+$/) == 0)
+               return "";
+
+       obj = fn;
+       mod = "";
+       fn = substr(fn, 1, RSTART) "." substr(fn, RSTART + 1) ".cmd";
+       if (getline s <fn == 1) {
+               if (match(s, /DKBUILD_MODFILE=['"]+[^'"]+/) > 0) {
+                       mod = substr(s, RSTART + 16, RLENGTH - 16);
+                       gsub(/['"]/, "", mod);
+               }
+       }
+       close(fn);
+
+       # A single module (common case) also reflects objects that are not part
+       # of a module.  Some of those objects have names that are also a module
+       # name (e.g. core).  We check the associated module file name, and if
+       # they do not match, the object is not part of a module.
+       if (mod !~ / /) {
+               if (!(mod in mods))
+                       mod = "";
+       }
+
+       gsub(/([^/ ]*\/)+/, "", mod);
+       gsub(/-/, "_", mod);
+
+       # At this point, mod is a single (valid) module name, or a list of
+       # module names (that do not need validation).
+       omod[obj] = mod;
+       close(fn);

I still see the second close(fn).









-- 
Best Regards
Masahiro Yamada
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help