Thread (10 messages) 10 messages, 4 authors, 2021-04-08

Re: [PATCH v3 04/12] module: Add printk format to add module build ID to stacktraces

From: Petr Mladek <pmladek@suse.com>
Date: 2021-04-07 14:54:41
Also in: lkml
Subsystem: library code, the rest · Maintainers: Andrew Morton, Linus Torvalds

On Tue 2021-03-30 20:05:12, Stephen Boyd wrote:
Let's make kernel stacktraces easier to identify by including the build
ID[1] of a module if the stacktrace is printing a symbol from a module.
This makes it simpler for developers to locate a kernel module's full
debuginfo for a particular stacktrace. Combined with
scripts/decode_stracktrace.sh, a developer can download the matching
debuginfo from a debuginfod[2] server and find the exact file and line
number for the functions plus offsets in a stacktrace that match the
module. This is especially useful for pstore crash debugging where the
kernel crashes are recorded in something like console-ramoops and the
recovery kernel/modules are different or the debuginfo doesn't exist on
the device due to space concerns (the debuginfo can be too large for
space limited devices).

Originally, I put this on the %pS format, but that was quickly rejected
given that %pS is used in other places such as ftrace where build IDs
aren't meaningful. There was some discussions on the list to put every
module build ID into the "Modules linked in:" section of the stacktrace
message but that quickly becomes very hard to read once you have more
than three or four modules linked in. It also provides too much
information when we don't expect each module to be traversed in a
stacktrace. Having the build ID for modules that aren't important just
makes things messy. Splitting it to multiple lines for each module
quickly explodes the number of lines printed in an oops too, possibly
wrapping the warning off the console. And finally, trying to stash away
each module used in a callstack to provide the ID of each symbol printed
is cumbersome and would require changes to each architecture to stash
away modules and return their build IDs once unwinding has completed.

Instead, we opt for the simpler approach of introducing a new printk
format '%pS[R]b' for "pointer symbolic backtrace with module build ID"
and then updating the few places in the architecture layer where the
stacktrace is printed to use this new format.

Example:

 WARNING: CPU: 3 PID: 3373 at drivers/misc/lkdtm/bugs.c:83 lkdtm_WARNING+0x28/0x30 [lkdtm]
 Modules linked in: lkdtm rfcomm algif_hash algif_skcipher af_alg xt_cgroup uinput xt_MASQUERADE hci_uart <modules trimmed>
 CPU: 3 PID: 3373 Comm: bash Not tainted 5.11 #12 a8c0d47f7051f3e6670ceaea724af66a39c6cec8
 Hardware name: Google Lazor (rev3+) with KB Backlight (DT)
 pstate: 00400009 (nzcv daif +PAN -UAO -TCO BTYPE=--)
 pc : lkdtm_WARNING+0x28/0x30 [lkdtm]
 lr : lkdtm_do_action+0x24/0x40 [lkdtm]
 sp : ffffffc013febca0
 x29: ffffffc013febca0 x28: ffffff88d9438040
 x27: 0000000000000000 x26: 0000000000000000
 x25: 0000000000000000 x24: ffffffdd0e9772c0
 x23: 0000000000000020 x22: ffffffdd0e975366
 x21: ffffffdd0e9772e0 x20: ffffffc013febde0
 x19: 0000000000000008 x18: 0000000000000000
 x17: 0000000000000000 x16: 0000000000000037
 x15: ffffffdd102ab174 x14: 0000000000000003
 x13: 0000000000000004 x12: 0000000000000000
 x11: 0000000000000000 x10: 0000000000000000
 x9 : 0000000000000001 x8 : ffffffdd0e979000
 x7 : 0000000000000000 x6 : ffffffdd10ff6b54
 x5 : 0000000000000000 x4 : 0000000000000000
 x3 : ffffffc013feb938 x2 : ffffff89fef05a70
 x1 : ffffff89feef5788 x0 : ffffffdd0e9772e0
 Call trace:
  lkdtm_WARNING+0x28/0x30 [lkdtm 6c2215028606bda50de823490723dc4bc5bf46f9]
  direct_entry+0x16c/0x1b4 [lkdtm 6c2215028606bda50de823490723dc4bc5bf46f9]
  full_proxy_write+0x74/0xa4
  vfs_write+0xec/0x2e8
  ksys_write+0x84/0xf0
  __arm64_sys_write+0x24/0x30
  el0_svc_common+0xf4/0x1c0
  do_el0_svc_compat+0x28/0x3c
  el0_svc_compat+0x10/0x1c
  el0_sync_compat_handler+0xa8/0xcc
  el0_sync_compat+0x178/0x180
 ---[ end trace f89bc7f5417cbcc6 ]---
I used this test patch:
diff --git a/lib/test_printf.c b/lib/test_printf.c
index 95a2f82427c7..06716948de3c 100644
--- a/lib/test_printf.c
+++ b/lib/test_printf.c
@@ -686,6 +686,7 @@ test_pointer(void)
        kernel_ptr();
        struct_resource();
        addr();
+       BUG();
        escaped_str();
        hex_string();
        mac();
Then I did

  # modprobe test_printf

and got the following:

[  125.707281][ T1837] test_printf: loaded.
[  125.707707][ T1837] ------------[ cut here ]------------
[  125.708227][ T1837] kernel BUG at lib/test_printf.c:689!
[  125.709102][ T1837] invalid opcode: 0000 [#1] SMP NOPTI
[  125.709111][ T1837] CPU: 3 PID: 1837 Comm: modprobe Kdump: loaded Tainted: G            E     5.12.0-rc6-default+ #171 00000080ffffffff000000000000000000000000
[  125.712102][ T1837] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014
[  125.712191][ T1837] RIP: 0010:test_printf_init+0x561/0xc99 [test_printf c2388ff0552611501b4d2ad58d8e5ca441d9a350]
[  125.712233][ T1837] Code: 00 48 c7 c7 b8 96 0f c0 e8 19 f9 ff ff b9 ab 00 00 00 48 c7 c2 93 96 0f c0 be 08 00 00 00 48 c7 c7 af 96 0f c0 e8 fc f8 ff ff <0f> 0b 8b 05 44 07 00 00 8b 35 3a 07 00 00 8b 1d 3c 07 00 00 85 c0
[  125.712235][ T1837] RSP: 0018:ffffb060004abc78 EFLAGS: 00010282
[  125.712240][ T1837] RAX: 0000000000000000 RBX: ffffb060004abc80 RCX: ffffd05fffc00b70
[  125.712241][ T1837] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffad352fd5
[  125.712242][ T1837] RBP: ffffffffc00fd367 R08: 0000000000000001 R09: 0000000000000001
[  125.712243][ T1837] R10: 0000000000000000 R11: 0000000000000001 R12: ffff9b6cc49ffc00
[  125.712244][ T1837] R13: 0000000000000001 R14: ffff9b6cc2ee0000 R15: ffffb060004abe90
[  125.712247][ T1837] FS:  00007f70c4466b80(0000) GS:ffff9b6d3fc00000(0000) knlGS:0000000000000000
[  125.712248][ T1837] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  125.712249][ T1837] CR2: 00007ffdb02e8b38 CR3: 000000010b6f2002 CR4: 0000000000370ee0
[  125.712257][ T1837] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  125.712257][ T1837] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  125.712258][ T1837] Call Trace:
[  125.712275][ T1837]  ? __test+0x13c/0x149 [test_printf]
[  125.723404][ T1837]  ? rcu_read_lock_sched_held+0x52/0x80
[  125.723425][ T1837]  do_one_initcall+0x5b/0x2d0
[  125.724367][ T1837]  do_init_module+0x5b/0x21c
[  125.724377][ T1837]  load_module+0x1eaa/0x23c0
[  125.725381][ T1837]  ? show_modinfo_version+0x30/0x30
[  125.725422][ T1837]  ? __do_sys_finit_module+0xad/0x110
[  125.725425][ T1837]  __do_sys_finit_module+0xad/0x110
[  125.725431][ T1837]  do_syscall_64+0x33/0x40
[  125.725464][ T1837]  entry_SYSCALL_64_after_hwframe+0x44/0xae

It shows wrong build id for vmlinux.
And it does not show the build if for the module at all.


readelf shows the following:

# readelf -Wn /lib/modules/5.12.0-rc6-default+/kernel/lib/test_printf.ko

Displaying notes found in: .note.gnu.build-id
  Owner                 Data size       Description
  GNU                  0x00000014       NT_GNU_BUILD_ID (unique build ID bitstring)         Build ID: c2388ff0552611501b4d2ad58d8e5ca441d9a350


Best Regards,
Petr
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help