Re: [PATCH v3 04/12] module: Add printk format to add module build ID to stacktraces
From: Petr Mladek <pmladek@suse.com>
Date: 2021-04-07 14:54:41
Also in:
lkml
Subsystem:
library code, the rest · Maintainers:
Andrew Morton, Linus Torvalds
On Tue 2021-03-30 20:05:12, Stephen Boyd wrote:
Let's make kernel stacktraces easier to identify by including the build ID[1] of a module if the stacktrace is printing a symbol from a module. This makes it simpler for developers to locate a kernel module's full debuginfo for a particular stacktrace. Combined with scripts/decode_stracktrace.sh, a developer can download the matching debuginfo from a debuginfod[2] server and find the exact file and line number for the functions plus offsets in a stacktrace that match the module. This is especially useful for pstore crash debugging where the kernel crashes are recorded in something like console-ramoops and the recovery kernel/modules are different or the debuginfo doesn't exist on the device due to space concerns (the debuginfo can be too large for space limited devices). Originally, I put this on the %pS format, but that was quickly rejected given that %pS is used in other places such as ftrace where build IDs aren't meaningful. There was some discussions on the list to put every module build ID into the "Modules linked in:" section of the stacktrace message but that quickly becomes very hard to read once you have more than three or four modules linked in. It also provides too much information when we don't expect each module to be traversed in a stacktrace. Having the build ID for modules that aren't important just makes things messy. Splitting it to multiple lines for each module quickly explodes the number of lines printed in an oops too, possibly wrapping the warning off the console. And finally, trying to stash away each module used in a callstack to provide the ID of each symbol printed is cumbersome and would require changes to each architecture to stash away modules and return their build IDs once unwinding has completed. Instead, we opt for the simpler approach of introducing a new printk format '%pS[R]b' for "pointer symbolic backtrace with module build ID" and then updating the few places in the architecture layer where the stacktrace is printed to use this new format. Example: WARNING: CPU: 3 PID: 3373 at drivers/misc/lkdtm/bugs.c:83 lkdtm_WARNING+0x28/0x30 [lkdtm] Modules linked in: lkdtm rfcomm algif_hash algif_skcipher af_alg xt_cgroup uinput xt_MASQUERADE hci_uart <modules trimmed> CPU: 3 PID: 3373 Comm: bash Not tainted 5.11 #12 a8c0d47f7051f3e6670ceaea724af66a39c6cec8 Hardware name: Google Lazor (rev3+) with KB Backlight (DT) pstate: 00400009 (nzcv daif +PAN -UAO -TCO BTYPE=--) pc : lkdtm_WARNING+0x28/0x30 [lkdtm] lr : lkdtm_do_action+0x24/0x40 [lkdtm] sp : ffffffc013febca0 x29: ffffffc013febca0 x28: ffffff88d9438040 x27: 0000000000000000 x26: 0000000000000000 x25: 0000000000000000 x24: ffffffdd0e9772c0 x23: 0000000000000020 x22: ffffffdd0e975366 x21: ffffffdd0e9772e0 x20: ffffffc013febde0 x19: 0000000000000008 x18: 0000000000000000 x17: 0000000000000000 x16: 0000000000000037 x15: ffffffdd102ab174 x14: 0000000000000003 x13: 0000000000000004 x12: 0000000000000000 x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000001 x8 : ffffffdd0e979000 x7 : 0000000000000000 x6 : ffffffdd10ff6b54 x5 : 0000000000000000 x4 : 0000000000000000 x3 : ffffffc013feb938 x2 : ffffff89fef05a70 x1 : ffffff89feef5788 x0 : ffffffdd0e9772e0 Call trace: lkdtm_WARNING+0x28/0x30 [lkdtm 6c2215028606bda50de823490723dc4bc5bf46f9] direct_entry+0x16c/0x1b4 [lkdtm 6c2215028606bda50de823490723dc4bc5bf46f9] full_proxy_write+0x74/0xa4 vfs_write+0xec/0x2e8 ksys_write+0x84/0xf0 __arm64_sys_write+0x24/0x30 el0_svc_common+0xf4/0x1c0 do_el0_svc_compat+0x28/0x3c el0_svc_compat+0x10/0x1c el0_sync_compat_handler+0xa8/0xcc el0_sync_compat+0x178/0x180 ---[ end trace f89bc7f5417cbcc6 ]---
I used this test patch:
diff --git a/lib/test_printf.c b/lib/test_printf.c
index 95a2f82427c7..06716948de3c 100644
--- a/lib/test_printf.c
+++ b/lib/test_printf.c@@ -686,6 +686,7 @@ test_pointer(void) kernel_ptr(); struct_resource(); addr(); + BUG(); escaped_str(); hex_string(); mac();
Then I did # modprobe test_printf and got the following: [ 125.707281][ T1837] test_printf: loaded. [ 125.707707][ T1837] ------------[ cut here ]------------ [ 125.708227][ T1837] kernel BUG at lib/test_printf.c:689! [ 125.709102][ T1837] invalid opcode: 0000 [#1] SMP NOPTI [ 125.709111][ T1837] CPU: 3 PID: 1837 Comm: modprobe Kdump: loaded Tainted: G E 5.12.0-rc6-default+ #171 00000080ffffffff000000000000000000000000 [ 125.712102][ T1837] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014 [ 125.712191][ T1837] RIP: 0010:test_printf_init+0x561/0xc99 [test_printf c2388ff0552611501b4d2ad58d8e5ca441d9a350] [ 125.712233][ T1837] Code: 00 48 c7 c7 b8 96 0f c0 e8 19 f9 ff ff b9 ab 00 00 00 48 c7 c2 93 96 0f c0 be 08 00 00 00 48 c7 c7 af 96 0f c0 e8 fc f8 ff ff <0f> 0b 8b 05 44 07 00 00 8b 35 3a 07 00 00 8b 1d 3c 07 00 00 85 c0 [ 125.712235][ T1837] RSP: 0018:ffffb060004abc78 EFLAGS: 00010282 [ 125.712240][ T1837] RAX: 0000000000000000 RBX: ffffb060004abc80 RCX: ffffd05fffc00b70 [ 125.712241][ T1837] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffad352fd5 [ 125.712242][ T1837] RBP: ffffffffc00fd367 R08: 0000000000000001 R09: 0000000000000001 [ 125.712243][ T1837] R10: 0000000000000000 R11: 0000000000000001 R12: ffff9b6cc49ffc00 [ 125.712244][ T1837] R13: 0000000000000001 R14: ffff9b6cc2ee0000 R15: ffffb060004abe90 [ 125.712247][ T1837] FS: 00007f70c4466b80(0000) GS:ffff9b6d3fc00000(0000) knlGS:0000000000000000 [ 125.712248][ T1837] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 125.712249][ T1837] CR2: 00007ffdb02e8b38 CR3: 000000010b6f2002 CR4: 0000000000370ee0 [ 125.712257][ T1837] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 125.712257][ T1837] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 125.712258][ T1837] Call Trace: [ 125.712275][ T1837] ? __test+0x13c/0x149 [test_printf] [ 125.723404][ T1837] ? rcu_read_lock_sched_held+0x52/0x80 [ 125.723425][ T1837] do_one_initcall+0x5b/0x2d0 [ 125.724367][ T1837] do_init_module+0x5b/0x21c [ 125.724377][ T1837] load_module+0x1eaa/0x23c0 [ 125.725381][ T1837] ? show_modinfo_version+0x30/0x30 [ 125.725422][ T1837] ? __do_sys_finit_module+0xad/0x110 [ 125.725425][ T1837] __do_sys_finit_module+0xad/0x110 [ 125.725431][ T1837] do_syscall_64+0x33/0x40 [ 125.725464][ T1837] entry_SYSCALL_64_after_hwframe+0x44/0xae It shows wrong build id for vmlinux. And it does not show the build if for the module at all. readelf shows the following: # readelf -Wn /lib/modules/5.12.0-rc6-default+/kernel/lib/test_printf.ko Displaying notes found in: .note.gnu.build-id Owner Data size Description GNU 0x00000014 NT_GNU_BUILD_ID (unique build ID bitstring) Build ID: c2388ff0552611501b4d2ad58d8e5ca441d9a350 Best Regards, Petr