Re: [Qemu-ppc] pseries on qemu-system-ppc64le crashes in doorbell_core_ipi()
From: David? Gibson <hidden>
Date: 2019-12-20 00:31:08
On Thu, Dec 19, 2019 at 02:08:29PM +0100, Cédric Le Goater wrote:
On 19/12/2019 13:45, Michael Ellerman wrote:quoted
"Jason A. Donenfeld" [off-list ref] writes:quoted
Hi folks, I'm actually still experiencing this sporadically in the WireGuard test suite, which you can see being run on https://build.wireguard.com/ .Fancy dashboard you got there :)quoted
About 50% of the time the powerpc64 build will fail at a place like this: [ 65.147823] Oops: Exception in kernel mode, sig: 4 [#1] [ 65.149198] LE PAGE_SIZE=4K MMU=Hash PREEMPT SMP NR_CPUS=4 NUMA pSeries [ 65.149595] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.5.0-rc1+ #1 [ 65.149745] NIP: c000000000033330 LR: c00000000007eda0 CTR: c00000000007ed80 [ 65.149934] REGS: c000000000a47970 TRAP: 0700 Not tainted (5.5.0-rc1+) [ 65.150032] MSR: 800000000288b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> > CR: 48008288 XER: 00000000 [ 65.150352] CFAR: c0000000000332bc IRQMASK: 1 [ 65.150352] GPR00: c000000000036508 c000000000a47c00 c000000000a4c100 0000000000000001 [ 65.150352] GPR04: c000000000a50998 0000000000000000 c000000000a50908 000000000f509000 [ 65.150352] GPR08: 0000000028000000 0000000000000000 0000000000000000 c00000000ff24f00 [ 65.150352] GPR12: c00000000007ed80 c000000000ad9000 0000000000000000 0000000000000000 [ 65.150352] GPR16: 00000000008c9190 00000000008c94a8 00000000008c92f8 00000000008c98b0 [ 65.150352] GPR20: 00000000008f2f88 fffffffffffffffd 0000000000000014 0000000000e6c100 [ 65.150352] GPR24: 0000000000e6c100 0000000000000001 0000000000000000 c000000000a50998 [ 65.150352] GPR28: c000000000a9e280 c000000000a50aa4 0000000000000002 0000000000000000 [ 65.151591] NIP [c000000000033330] doorbell_try_core_ipi+0xd0/0xf0 [ 65.151750] LR [c00000000007eda0] smp_pseries_cause_ipi+0x20/0x70 [ 65.151913] Call Trace: [ 65.152109] [c000000000a47c00] [c0000000000c7c9c] _nohz_idle_balance+0xbc/0x300 (unreliable) [ 65.152370] [c000000000a47c30] [c000000000036508] smp_send_reschedule+0x98/0xb0 [ 65.152711] [c000000000a47c50] [c0000000000c1634] kick_ilb+0x114/0x140 [ 65.152962] [c000000000a47ca0] [c0000000000c86d8] newidle_balance+0x4e8/0x500 [ 65.153213] [c000000000a47d20] [c0000000000c8788] pick_next_task_fair+0x48/0x3a0 [ 65.153424] [c000000000a47d80] [c000000000466620] __schedule+0xf0/0x430 [ 65.153612] [c000000000a47de0] [c000000000466b04] schedule_idle+0x34/0x70 [ 65.153786] [c000000000a47e10] [c0000000000c0bc8] do_idle+0x1a8/0x220 [ 65.154121] [c000000000a47e70] [c0000000000c0e94] cpu_startup_entry+0x34/0x40 [ 65.154313] [c000000000a47ea0] [c00000000000ef1c] rest_init+0x10c/0x124 [ 65.154414] [c000000000a47ee0] [c000000000500004] start_kernel+0x568/0x594 [ 65.154585] [c000000000a47f90] [c00000000000a7cc] start_here_common+0x1c/0x330 [ 65.154854] Instruction dump: [ 65.155191] 38210030 e8010010 7c0803a6 4e800020 3d220004 39295228 81290000 3929ffff [ 65.155498] 7d284038 7c0004ac 5508017e 65082800 <7c00411c> e94d0178 812a0000 3929ffff^ Again the faulting instruction there is "msgsndp r8"quoted
[ 65.156155] ---[ end trace 6180d12e268ffdaf ]--- [ 65.185452] [ 66.187490] Kernel panic - not syncing: Fatal exception This is with "qemu-system-ppc64 -smp 4 -machine pseries" on QEMU 4.0.0. I'm not totally sure what's going on here. I'm emulating a pseries, and using that with qemu's pseries model, and I see that selecting the pseries forces the selection of 'config PPC_DOORBELL' (twice in the same section, actually).Noted.quoted
Then inside the kernel there appears to be some runtime CPU check for doorbell support.Not really. The kernel looks at the CPU revision (PVR) and decides that it has doorbell support.quoted
Is this a case in which QEMU is advertising doorbell support that TCG doesn't have? Or is something else happening here?It's a gap in the emulation I guess. qemu doesn't emulate msgsndp, but it really should because that's a supported instruction since Power8.There is a patch for msgsndp in my tree you could try : https://github.com/legoater/qemu/tree/powernv-5.0 Currently being reviewed. I have to address some remarks from David before it can be merged.
Right. It needs some polish, but I expect we'll have this merged in the not too distant future.
quoted
I suspect msgsndp wasn't implemented for TCG because TCG doesn't support more than one thread per core, and you can only send doorbells to other threads in the same core, and therefore there is no reason to ever use msgsndp.There is a need now with KVM emulation under TCG, but, yes, QEMU still lacks SMT support.quoted
That's the message Suraj mentioned up thread, eg: $ qemu-system-ppc64 -nographic -vga none -M pseries -smp 2,threads=2 -cpu POWER8 -kernel build~/vmlinux qemu-system-ppc64: TCG cannot support more than 1 thread/core on a pseries machine But I guess we've hit another case of a CPU sending itself an IPI, and the way the sibling masks are done, CPUs are siblings of themselves, so the sibling test passes, eg: int doorbell_try_core_ipi(int cpu) { int this_cpu = get_cpu(); int ret = 0; if (cpumask_test_cpu(cpu, cpu_sibling_mask(this_cpu))) { doorbell_core_ipi(cpu); In which case this patch should fix it.diff --git a/arch/powerpc/kernel/dbell.c b/arch/powerpc/kernel/dbell.c index f17ff1200eaa..e45cb9bba193 100644 --- a/arch/powerpc/kernel/dbell.c +++ b/arch/powerpc/kernel/dbell.c@@ -63,7 +63,7 @@ int doorbell_try_core_ipi(int cpu) int this_cpu = get_cpu(); int ret = 0; - if (cpumask_test_cpu(cpu, cpu_sibling_mask(this_cpu))) { + if (cpu != this_cpu && cpumask_test_cpu(cpu, cpu_sibling_mask(this_cpu))) { doorbell_core_ipi(cpu); ret = 1; }The other option would be we disable CPU_FTR_DBELL if we detect we're running under TCG. But I'm not sure we have a particularly clean way to detect that.does the pseries kernel support cpufeatures in the DT ?
-- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
Attachments
- signature.asc [application/pgp-signature] 833 bytes