Re: Very unreliable booting (WARNING: CPU: 0 PID: 1 at kernel/context_tracking.c:215 ct_nmi_exit+0xa0/0xc0) with PPC_EARLY_DEBUG_G5 set on a PowerMac G5, kernel 6.7-rc6
From: Erhard Furtner <hidden>
Date: 2023-12-31 18:44:45
On Thu, 21 Dec 2023 16:30:34 +1100 Michael Ellerman [off-list ref] wrote:
quoted
Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.7.0-rc6-PMacG5 #4 Hardware name: PowerMac11,2 PPC970MP 0x440101 PowerMac NIP: c000000000bbf320 LR: c0000000000ab040 CTR: 0000000000000000 REGS: c000000003103300 TRAP: 0700 Not tainted (6.7.0-rc6-PMacG5) MSR: 9000000000029032 <SF,HV,EE,ME,IR,DR,RI> CR: 24040840 XER: 20000000 IRQMASK: 1 GPR00: c0000000000ab068 c0000000031035a0 c000000000e43500 0000000000000000 GPR04: c00000000fffbeb0 c00000000fffbeb8 0000000000000000 000000047ef66000 GPR08: c00000047fe059b0 c000000000e9f9b0 c000000000e9f9c0 0000000000000000 GPR12: c000000000149e10 c00000000110a000 c00000000000dc10 0000000000000000 GPR16: 0000000000000000 0000000000000000 c00000000108b9c8 0000000000000008 GPR20: 000000000000005d 0000000000000000 0000000000000000 0000000000000001 GPR24: c0000000010d04c4 c00000000108b900 c0000000010d04c0 000000047ef66000 GPR28: c000000003103650 c000000000e9a400 000000000000006e c000000003170000 NIP [c000000000bbf320] ct_nmi_exit+0xa0/0xc0 LR [c0000000000ab040] irq_exit+0x90/0x110 Call Trace: [c0000000031035a0] [c0000000000ab068] irq_exit+0xb8/0x110 (unreliable) [c0000000031035d0] [c0000000000196f0] timer_interrupt+0x1b0/0x370 [c000000003103620] [c0000000000089f4] decrementer_common_virt+0x214/0x220--- interrupt: 900 at real_readb+0x44/0x68NIP: c00000000001f24c LR: c000000000074258 CTR: c0000000000087c0 REGS: c000000003103650 TRAP: 0900 Not tainted (6.7.0-rc6-PMacG5) MSR: 9000000000009022 <SF,HV,EE,ME,IR,RI> CR: 24040242 XER: 20000000 IRQMASK: 0 GPR00: 9000000000009022 c0000000031038f0 c000000000e43500 0000000000000071 GPR04: 0000000080013030 0000011000000040 0000001000000040 9000000000009032 GPR08: 0000000000000000 c0000000010d3500 000000047ef66000 0000000052454753 GPR12: 9000000000009032 c00000000110a000 c00000000000dc10 0000000000000000 GPR16: 0000000000000000 0000000000000000 c00000000108b9c8 0000000000000008 GPR20: 000000000000005d 0000000000000000 0000000000000000 0000000000000001 GPR24: c0000000010d04c4 c00000000108b900 c0000000010d04c0 0000000000000001 GPR28: 000000007db14ba6 0000000000000000 000000000000006e c0000000010d3b18 NIP [c00000000001f24c] real_readb+0x44/0x68 LR [c000000000074258] udbg_real_scc_putc+0x38/0x80Looks like you were spinning waiting for the serial port to become ready, long enough that a decrementer came in. That seems kind of odd, but then the system managed to boot, so I guess it resolved itself.
Tried without the serial port card inserted and without serial debug boot parameters but I still get it on 6.7-rc7 on the rare occasions the machine successfully finishes booting: [...] smp: Bringing up secondary CPUs ... ------------[ cut here ]------------ WARNING: CPU: 0 PID: 1 at kernel/context_tracking.c:215 ct_nmi_exit+0xa0/0xc0 Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.7.0-rc7-PMacG5-dirty #3 Hardware name: PowerMac11,2 PPC970MP 0x440101 PowerMac NIP: c000000000be4d20 LR: c0000000000ab060 CTR: c000000000019290 REGS: c0000000031032f0 TRAP: 0700 Not tainted (6.7.0-rc7-PMacG5-dirty) MSR: 9000000000021032 <SF,HV,ME,IR,DR,RI> CR: 24040844 XER: 20000000 IRQMASK: 3 GPR00: c0000000000aaff4 c000000003103590 c000000000e73b00 0000000000000000 GPR04: c000000003103570 c000000003103578 0000000000000000 000000047ef35000 GPR08: c00000047fe059b0 c000000000ed09b0 c000000000ed09c0 0000000000000000 GPR12: c000000000019290 c00000000114a000 c00000000000dc10 0000000000000000 GPR16: 0000000000000000 0000000000000000 c0000000010cf9c8 0000000000000008 GPR20: 000000000000005d 0000000000000000 0000000000000000 0000000000000001 GPR24: c0000000011104c4 c0000000010cf900 c0000000011104c0 000000047ef35000 GPR28: c000000003103640 c000000000ecb400 c000000001113b18 c000000003170000 NIP [c000000000be4d20] ct_nmi_exit+0xa0/0xc0 LR [c0000000000ab060] irq_exit+0x90/0x110 Call Trace: [c000000003103590] [c0000000000aaff4] irq_exit+0x24/0x110 (unreliable) [c0000000031035c0] [c0000000000196f0] timer_interrupt+0x1b0/0x370 [c000000003103610] [c0000000000089f4] decrementer_common_virt+0x214/0x220
--- interrupt: 900 at real_readb+0x44/0x68NIP: c00000000001f24c LR: c000000000074270 CTR: 0000000000000000 REGS: c000000003103640 TRAP: 0900 Not tainted (6.7.0-rc7-PMacG5-dirty) MSR: 9000000000009022 <SF,HV,EE,ME,IR,RI> CR: 24040244 XER: 20000000 IRQMASK: 0 GPR00: 9000000000009022 c0000000031038e0 c000000000e73b00 0000000000000071 GPR04: 0000000080013030 0000011000000040 0000001000000040 9000000000009032 GPR08: 0000000000000000 c000000001113b00 0000000000000004 0000000000000000 GPR12: c00000000006bca0 c00000000114a000 c00000000000dc10 0000000000000000 GPR16: 0000000000000000 0000000000000000 c0000000010cf9c8 0000000000000008 GPR20: 000000000000005d 0000000000000000 0000000000000000 0000000000000001 GPR24: c0000000011104c4 c0000000010cf900 c0000000011104c0 0000000000000001 GPR28: 000000007db14ba6 000000000000006f c000000001113b18 00000000000026e8 NIP [c00000000001f24c] real_readb+0x44/0x68 LR [c000000000074270] udbg_real_scc_putc+0x50/0xa0
--- interrupt: 900[c0000000031038e0] [c000000000968ba4] of_get_cpu_hwid+0x54/0x120 (unreliable) [c000000003103920] [c0000000000743e0] udbg_adb_putc+0x30/0x50 [c000000003103940] [c00000000001e7d4] udbg_puts+0x64/0xb0 [c000000003103970] [c000000000e76124] udbg_progress+0x18/0x30 [c000000003103990] [c000000000075a40] smp_core99_kick_cpu+0x140/0x180 [c000000003103a10] [c00000000003371c] __cpu_up+0x12c/0x3c0 [c000000003103ad0] [c0000000000a3088] bringup_cpu+0x68/0x200 [c000000003103b30] [c0000000000a1bf0] cpuhp_invoke_callback+0x170/0x300 [c000000003103b80] [c0000000000a45b8] _cpu_up.constprop.0+0x308/0x730 [c000000003103c10] [c0000000000a4b0c] cpu_up+0x12c/0x180 [c000000003103ca0] [c000000000e8372c] bringup_nonboot_cpus+0x7c/0xec [c000000003103cf0] [c000000000e8a31c] smp_init+0x40/0xa4 [c000000003103d50] [c000000000e71300] kernel_init_freeable+0x188/0x358 [c000000003103de0] [c00000000000dc38] kernel_init+0x30/0x158 [c000000003103e50] [c00000000000bf94] ret_from_kernel_user_thread+0x14/0x1c
--- interrupt: 0 at 0x0Code: 39200000 38600004 f9280010 4bfffd4d e92d0128 38210020 81290000 e8010010 7c0803a6 4e800020 60000000 60000000 <0fe00000> 4bffff9c 60000000 60000000 ---[ end trace 0000000000000000 ]---
quoted hunk ↗ jump to hunk
I don't see any obvious clues to why it's booting sometimes and not others. Maybe when it doesn't boot it's getting stuck forever in that real_readb() loop? We could try adding a maximum retry count, eg something like:diff --git a/arch/powerpc/platforms/powermac/udbg_scc.c b/arch/powerpc/platforms/powermac/udbg_scc.c index 1b7c39e841ee..b28c9a729fbf 100644 --- a/arch/powerpc/platforms/powermac/udbg_scc.c +++ b/arch/powerpc/platforms/powermac/udbg_scc.c@@ -165,8 +165,11 @@ void __init udbg_scc_init(int force_scc) #ifdef CONFIG_PPC64 static void udbg_real_scc_putc(char c) { - while ((real_readb(sccc) & SCC_TXRDY) == 0) - ; + int i = 0; + + while (((real_readb(sccc) & SCC_TXRDY) == 0) && i < 10000) + i++; + real_writeb(c, sccd); if (c == '\n') udbg_real_scc_putc('\r');But that's just a total stab in the dark :)
Thanks for patch! But unfortunately it does not seem to make a difference. Also I set IRQ_ALL_CPUS=n but that does not change a thing either. dmesg of v6.7-rc7 w. your patch attached. Regards, Erhard
Attachments
- config_67-rc7_g5- [application/octet-stream] 102548 bytes
- dmesg_67-rc7_g5 [application/octet-stream] 50652 bytes