Thread (11 messages) 11 messages, 3 authors, 2021-05-08

Re: [BUG mips llvm] MIPS: malformed R_MIPS_{HI16,LO16} with LLVM

From: Alexander Lobakin <hidden>
Date: 2021-01-11 20:51:19
Also in: linux-arch, lkml

From: Nick Desaulniers <redacted>
Date: Mon, 11 Jan 2021 12:03:29 -0800
Hi Alexander,
I'm genuinely trying to reproduce/understand this report, questions below:
Hi Nick!
On Sat, Jan 9, 2021 at 11:15 AM Alexander Lobakin [off-list ref] wrote:
quoted
From: Nick Desaulniers <redacted>
Date: Sat, 9 Jan 2021 09:50:44 -0800
quoted
On Sat, Jan 9, 2021 at 9:11 AM Alexander Lobakin [off-list ref] wrote:
quoted
Machine: MIPS32 R2 Big Endian (interAptiv (multi))

While testing MIPS with LLVM, I found a weird and very rare bug with
MIPS relocs that LLVM emits into kernel modules. It happens on both
11.0.0 and latest git snapshot and applies, as I can see, only to
references to static symbols.

When the kernel loads the module, it allocates a space for every
section and then manually apply the relocations relative to the
new address.

Let's say we have a function phy_probe() in drivers/net/phy/libphy.ko.
It's static and referenced only in phy_register_driver(), where it's
used to fill callback pointer in a structure.

The real function address after module loading is 0xc06c1444, that
is observed in its ELF st_value field.
There are two relocs related to this usage in phy_register_driver():

R_MIPS_HI16 refers to 0x3c010000
R_MIPS_LO16 refers to 0x24339444
Sorry, how are these calculated?  (Explicit shell commands invoked
would be appreciated)
Just look at the disassembly below and you'll see the similar
instructions in your module.
I'm doing:
$ ARCH=mips CROSS_COMPILE=mips-linux-gnu- make CC=clang -j71 32r2_defconfig
$ ARCH=mips CROSS_COMPILE=mips-linux-gnu- make CC=clang -j71 modules
$ llvm-nm --format=sysv drivers/net/phy/phy_device.o | grep phy_probe
$ llvm-objdump -Dr --disassemble-symbols=phy_driver_register drivers/net/phy/phy_device.o
$ llvm-readelf -r drivers/net/phy/phy_device.o  | grep -e R_MIPS_HI16 -e R_MIPS_LO16

for some of the commands trying to verify.
quoted
quoted
quoted
The address of .text is 0xc06b8000. So the destination is calculated
as follows:

0x00000000 from hi16;
0xffff9444 from lo16 (sign extend as it's always treated as signed);
0xc06b8000 from base.

= 0xc06b1444. The value is lower than the real phy_probe() address
(0xc06c1444) by 0x10000 and is lower than the base address of
module's .text, so it's 100% incorrect.
The disassembly for me produces:
    399c: 3c 03 00 00   lui     $3, 0 <phy_device_free>
                        0000399c:  R_MIPS_HI16  .text
...
    39a8: 24 63 3a 5c   addiu   $3, $3, 14940 <phy_probe>
                        000039a8:  R_MIPS_LO16  .text
So, in your case the values of the instructions that relocs refer are:

0x3c030000 R_MIPS_HI16
0x24633a5c R_MIPS_LO16

Mine were:

0x3c010000
0x24339444

Your second one doesn't have bit 15 set, so I think this pair won't
break the code.
Try to hunt for R_MIPS_LO16 that have this bit set, i.e. they have
'8', '9', 'a', 'b', 'c', 'd' or 'e' as their [15:12].
I'm not really sure how to manually resolve the relocations; Fangrui
do you have any tips? (I'm coincidentally reading through Linkers &
Loaders currently, but only just started chpt. 4).
quoted
quoted
quoted
This results in:

[    2.204022] CPU 3 Unable to handle kernel paging request at virtual
address c06b1444, epc == c06b1444, ra == 803f1090

The correct instructions should be:

R_MIPS_HI16 0x3c010001
R_MIPS_LO16 0x24339444

so there'll be 0x00010000 from hi16.

I tried to catch those bugs in arch/mips/kernel/module.c (by checking
if the destination is lower than the base address, which should never
happen), and seems like I have only 3 such places in libphy.ko (and
one in nf_tables.ko).
I don't think it should be handled somehow in mentioned source code
as it would look rather ugly and may break kernels build with GNU
stack, which seems to not produce such bad codes.

If I should report this to any other resources, please let me know.
I chose clang-built-linux and LKML as it may not happen with userland
(didn't tried to catch).
Thanks for the report.  Sounds like we may indeed be producing an
incorrect relocation.  This is only seen for big endian triples?
Unfortunately I don't have a LE board to play with, so can confirm
only Big Endian.

(BTW, if someone can say if it's possible for MIPS (and how if it is)
to launch a LE kernel from BE-booted preloader and U-Boot, that would
be super cool)
quoted
Getting a way for us to deterministically reproduce would be a good
first step.  Which config or configs beyond defconfig, and which
relocations specifically are you observing this with?
I use `make 32r2_defconfig` which combines several configs from
arch/mips/configs:
 - generic_defconfig;
 - generic/32r2.config;
 - generic/eb.config.

Aside from that, I enable a bunch of my WIP drivers and the
Netfilter. On my setup, this bug is always present in libphy.ko,
so CONFIG_PHYLIB=m (with all deps) should be enough.

The three failed relocs belongs to this part of code: [0]

llvm-readelf on them:

Relocation section '.rel.text' at offset 0xbf60 contains 2281 entries:
[...]
00005740  00029305 R_MIPS_HI16            00000000   .text
00005744  00029306 R_MIPS_LO16            00000000   .text
00005720  00029305 R_MIPS_HI16            00000000   .text
00005748  00029306 R_MIPS_LO16            00000000   .text
0000573c  00029305 R_MIPS_HI16            00000000   .text
0000574c  00029306 R_MIPS_LO16            00000000   .text

The first pair is the one from my first mail:
0x3c010000 <-- should be 0x3c010001 to work properly
0x24339444

I'm planning to hunt for more now, will let you know.

[0] https://elixir.bootlin.com/linux/v5.11-rc2/source/drivers/net/phy/phy_device.c#L2989
quoted
Thanks,
~Nick Desaulniers
Thanks,
Al
Thanks,
~Nick Desaulniers
Thanks,
Al
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help