Re: [PATCH v6 3/4] scripts: add verifier script for builtin module range data
From: Masahiro Yamada <masahiroy@kernel.org>
Date: 2024-08-18 06:41:14
Also in:
linux-kbuild, linux-modules, lkml
On Fri, Aug 16, 2024 at 12:04 AM Kris Van Hees [off-list ref] wrote:
quoted hunk ↗ jump to hunk
The modules.builtin.ranges offset range data for builtin modules is generated at compile time based on the list of built-in modules and the vmlinux.map and vmlinux.o.map linker maps. This data can be used to determine whether a symbol at a particular address belongs to module code that was configured to be compiled into the kernel proper as a built-in module (rather than as a standalone module). This patch adds a script that uses the generated modules.builtin.ranges data to annotate the symbols in the System.map with module names if their address falls within a range that belongs to one or more built-in modules. It then processes the vmlinux.map (and if needed, vmlinux.o.map) to verify the annotation: - For each top-level section: - For each object in the section: - Determine whether the object is part of a built-in module (using modules.builtin and the .*.cmd file used to compile the object as suggested in [0]) - For each symbol in that object, verify that the built-in module association (or lack thereof) matches the annotation given to the symbol. Signed-off-by: Kris Van Hees <redacted> Reviewed-by: Nick Alcock <redacted> Reviewed-by: Alan Maguire <redacted> --- Changes since v5: - Added optional 6th argument to specify kernel build directory. - Report error and exit if .*.o.cmd files cannot be read. Changes since v4: - New patch in the series --- scripts/verify_builtin_ranges.awk | 365 ++++++++++++++++++++++++++++++ 1 file changed, 365 insertions(+) create mode 100755 scripts/verify_builtin_ranges.awkdiff --git a/scripts/verify_builtin_ranges.awk b/scripts/verify_builtin_ranges.awk new file mode 100755 index 000000000000..b82cf0a0fbeb --- /dev/null +++ b/scripts/verify_builtin_ranges.awk@@ -0,0 +1,365 @@ +#!/usr/bin/gawk -f +# SPDX-License-Identifier: GPL-2.0 +# verify_builtin_ranges.awk: Verify address range data for builtin modules +# Written by Kris Van Hees <kris.van.hees@oracle.com> +# +# Usage: verify_builtin_ranges.awk modules.builtin.ranges System.map \ +# modules.builtin vmlinux.map vmlinux.o.map \ +# [ <build-dir> ] +# + +# Return the module name(s) (if any) associated with the given object. +# +# If we have seen this object before, return information from the cache. +# Otherwise, retrieve it from the corresponding .cmd file. +# +function get_module_info(fn, mod, obj, mfn, s) { + if (fn in omod) + return omod[fn]; + + if (match(fn, /\/[^/]+$/) == 0) + return ""; + + obj = fn; + mod = ""; + mfn = ""; + fn = kdir "/" substr(fn, 1, RSTART) "." substr(fn, RSTART + 1) ".cmd"; + if (getline s <fn == 1) { + if (match(s, /DKBUILD_MODFILE=['"]+[^'"]+/) > 0) { + mfn = substr(s, RSTART + 16, RLENGTH - 16); + gsub(/['"]/, "", mfn); + + mod = mfn; + gsub(/([^/ ]*\/)+/, "", mod); + gsub(/-/, "_", mod); + } + } else { + print "ERROR: Failed to read: " fn "\n\n" \ + " Invalid kernel build directory (" kdir ")\n" \ + " or its content does not match " ARGV[1] >"/dev/stderr"; + close(fn); + total = 0; + exit(1); + } + close(fn); + + # A single module (common case) also reflects objects that are not part + # of a module. Some of those objects have names that are also a module + # name (e.g. core). We check the associated module file name, and if + # they do not match, the object is not part of a module. + if (mod !~ / /) { + if (!(mod in mods)) + return ""; + if (mods[mod] != mfn) + return ""; + } + + # At this point, mod is a single (valid) module name, or a list of + # module names (that do not need validation). + omod[obj] = mod; + close(fn); + + return mod; +}
This code is copy-paste from scripts/generate_builtin_ranges.awk
So, my comments in 2/4 can apply to this patch, too.
Instead of adding a separate script,
we could add a "verify mode" option.
scripts/generate_builtin_ranges.awk --verify ...
But, I do not know how much cleaner it will become.
I am not good at reviewing AWK code, but this
is how you go.
If this script were written in Python,
it would be easy and readable to
split logically-related code chunks into functions,
as follows:
def parse_module_builtin():
...
def parse_vmlinux_map_lld():
...
def parse_vmlinux_map_bfd():
...
def parse_vmlinux_o_map():
...
--
Best Regards
Masahiro Yamada