Re: [BUG mips llvm] MIPS: malformed R_MIPS_{HI16,LO16} with LLVM
From: Alexander Lobakin <hidden>
Date: 2021-01-11 20:51:19
Also in:
linux-arch, lkml
From: Nick Desaulniers <redacted> Date: Mon, 11 Jan 2021 12:03:29 -0800
Hi Alexander, I'm genuinely trying to reproduce/understand this report, questions below:
Hi Nick!
On Sat, Jan 9, 2021 at 11:15 AM Alexander Lobakin [off-list ref] wrote:quoted
From: Nick Desaulniers <redacted> Date: Sat, 9 Jan 2021 09:50:44 -0800quoted
On Sat, Jan 9, 2021 at 9:11 AM Alexander Lobakin [off-list ref] wrote:quoted
Machine: MIPS32 R2 Big Endian (interAptiv (multi)) While testing MIPS with LLVM, I found a weird and very rare bug with MIPS relocs that LLVM emits into kernel modules. It happens on both 11.0.0 and latest git snapshot and applies, as I can see, only to references to static symbols. When the kernel loads the module, it allocates a space for every section and then manually apply the relocations relative to the new address. Let's say we have a function phy_probe() in drivers/net/phy/libphy.ko. It's static and referenced only in phy_register_driver(), where it's used to fill callback pointer in a structure. The real function address after module loading is 0xc06c1444, that is observed in its ELF st_value field. There are two relocs related to this usage in phy_register_driver(): R_MIPS_HI16 refers to 0x3c010000 R_MIPS_LO16 refers to 0x24339444Sorry, how are these calculated? (Explicit shell commands invoked would be appreciated)
Just look at the disassembly below and you'll see the similar instructions in your module.
I'm doing: $ ARCH=mips CROSS_COMPILE=mips-linux-gnu- make CC=clang -j71 32r2_defconfig $ ARCH=mips CROSS_COMPILE=mips-linux-gnu- make CC=clang -j71 modules $ llvm-nm --format=sysv drivers/net/phy/phy_device.o | grep phy_probe $ llvm-objdump -Dr --disassemble-symbols=phy_driver_register drivers/net/phy/phy_device.o $ llvm-readelf -r drivers/net/phy/phy_device.o | grep -e R_MIPS_HI16 -e R_MIPS_LO16 for some of the commands trying to verify.quoted
quoted
quoted
The address of .text is 0xc06b8000. So the destination is calculated as follows: 0x00000000 from hi16; 0xffff9444 from lo16 (sign extend as it's always treated as signed); 0xc06b8000 from base. = 0xc06b1444. The value is lower than the real phy_probe() address (0xc06c1444) by 0x10000 and is lower than the base address of module's .text, so it's 100% incorrect.The disassembly for me produces: 399c: 3c 03 00 00 lui $3, 0 <phy_device_free> 0000399c: R_MIPS_HI16 .text ... 39a8: 24 63 3a 5c addiu $3, $3, 14940 <phy_probe> 000039a8: R_MIPS_LO16 .text
So, in your case the values of the instructions that relocs refer are: 0x3c030000 R_MIPS_HI16 0x24633a5c R_MIPS_LO16 Mine were: 0x3c010000 0x24339444 Your second one doesn't have bit 15 set, so I think this pair won't break the code. Try to hunt for R_MIPS_LO16 that have this bit set, i.e. they have '8', '9', 'a', 'b', 'c', 'd' or 'e' as their [15:12].
I'm not really sure how to manually resolve the relocations; Fangrui do you have any tips? (I'm coincidentally reading through Linkers & Loaders currently, but only just started chpt. 4).quoted
quoted
quoted
This results in: [ 2.204022] CPU 3 Unable to handle kernel paging request at virtual address c06b1444, epc == c06b1444, ra == 803f1090 The correct instructions should be: R_MIPS_HI16 0x3c010001 R_MIPS_LO16 0x24339444 so there'll be 0x00010000 from hi16. I tried to catch those bugs in arch/mips/kernel/module.c (by checking if the destination is lower than the base address, which should never happen), and seems like I have only 3 such places in libphy.ko (and one in nf_tables.ko). I don't think it should be handled somehow in mentioned source code as it would look rather ugly and may break kernels build with GNU stack, which seems to not produce such bad codes. If I should report this to any other resources, please let me know. I chose clang-built-linux and LKML as it may not happen with userland (didn't tried to catch).Thanks for the report. Sounds like we may indeed be producing an incorrect relocation. This is only seen for big endian triples?Unfortunately I don't have a LE board to play with, so can confirm only Big Endian. (BTW, if someone can say if it's possible for MIPS (and how if it is) to launch a LE kernel from BE-booted preloader and U-Boot, that would be super cool)quoted
Getting a way for us to deterministically reproduce would be a good first step. Which config or configs beyond defconfig, and which relocations specifically are you observing this with?I use `make 32r2_defconfig` which combines several configs from arch/mips/configs: - generic_defconfig; - generic/32r2.config; - generic/eb.config. Aside from that, I enable a bunch of my WIP drivers and the Netfilter. On my setup, this bug is always present in libphy.ko, so CONFIG_PHYLIB=m (with all deps) should be enough. The three failed relocs belongs to this part of code: [0] llvm-readelf on them: Relocation section '.rel.text' at offset 0xbf60 contains 2281 entries: [...] 00005740 00029305 R_MIPS_HI16 00000000 .text 00005744 00029306 R_MIPS_LO16 00000000 .text 00005720 00029305 R_MIPS_HI16 00000000 .text 00005748 00029306 R_MIPS_LO16 00000000 .text 0000573c 00029305 R_MIPS_HI16 00000000 .text 0000574c 00029306 R_MIPS_LO16 00000000 .text The first pair is the one from my first mail: 0x3c010000 <-- should be 0x3c010001 to work properly 0x24339444 I'm planning to hunt for more now, will let you know. [0] https://elixir.bootlin.com/linux/v5.11-rc2/source/drivers/net/phy/phy_device.c#L2989quoted
Thanks, ~Nick DesaulniersThanks, AlThanks, ~Nick Desaulniers
Thanks, Al