Re: Crash in __do_IRQ with gcc 15
From: Michael Ellerman <mpe@ellerman.id.au>
Date: 2025-05-12 03:31:44
Paul Mackerras [off-list ref] writes:
quoted hunk ↗ jump to hunk
Running Linux on Microwatt with a kernel compiled on an x86-64 system running Fedora 42 (using the packaged cross-compiler, i.e. the gcc-powerpcle64-linux-gnu package), I'm seeing a crash like this: [ 0.141591] smp: Bringing up secondary CPUs ... [ 0.167628] BUG: Unable to handle kernel data access on write at 0xc00a0000be8d6004 [ 0.175409] Faulting instruction address: 0xc00000000000fcb4 cpu 0x0: Vector: 300 (Data Access) at [c0000000012f78d0] pc: c00000000000fcb4: __do_IRQ+0x64/0x84 lr: c00000000000fccc: __do_IRQ+0x7c/0x84 sp: c0000000012f7b70 msr: 9000000000001033 dar: c00a0000be8d6004 dsisr: 42000000 current = 0xc0000000012de000 paca = 0xc00000000135d000 irqmask: 0x03 irq_happened: 0x01 pid = 0, comm = swapper/0 Linux version 6.15.0-rc1-00001-g72b73737d483-dirty (paulus@thinks) (powerpc64le-linux-gnu-gcc (GCC) 15.0.1 20250329 (Red Hat Cross 15.0.1-0), GNU ld version 2.44-1.fc42) #5 SMP Thu May 8 22:20:34 AEST 2025 enter ? for help [c0000000012f7b70] c00000000000fd50 do_IRQ+0x7c/0x90 (unreliable) [c0000000012f7ba0] c000000000007db4 hardware_interrupt_common_virt+0x1c4/0x1d0--- Exception: 500 (Hardware Interrupt) at c00000000001c2ec arch_local_irq_restore+0x60/0xc4[c0000000012f7ea0] c000000000083c68 do_idle+0xd4/0xf4 [c0000000012f7ee0] c000000000083e08 cpu_startup_entry+0x34/0x38 [c0000000012f7f10] c00000000000cc7c kernel_init+0x0/0x144 [c0000000012f7f40] c000000001000ecc do_one_initcall+0x0/0x160 [c0000000012f7fe0] c00000000000ba6c start_here_common+0x1c/0x20 0:mon> What's happening is that gcc 15 seems to be using r2 as an ordinary register, and r2 has a live value in it at the point where __do_IRQ() calls call_do_irq(). Since r2 is not in the clobber list for the inline asm in call_do_irq(), it doesn't get saved and restored around the call to __do_irq(), and when we come back to __do_IRQ(), it has been modified. Then when __do_IRQ() subsequently does a store using r2, it blows up like the above.
Are you building with pcrel? Otherwise r2 shouldn't be getting used as an ordinary register. Can you show the disassembly of where it's getting used? There was a change to r2 handling in GCC 15, but AFAICS it was meant to only affect pcrel code. Still it's likely our bug because we are being weird and calling a function inside an inline asm block. cheers