Re: [PATCH] powerpc/64s: Make unrecoverable SLB miss less confusing
From: Naveen N. Rao <hidden>
Date: 2018-08-08 14:43:00
Michael Ellerman wrote:
Nicholas Piggin [off-list ref] writes:quoted
On Thu, 26 Jul 2018 23:01:51 +1000 Michael Ellerman [off-list ref] wrote:quoted
If we take an SLB miss while MSR[RI]=3D0 we can't recover and have to oops. Currently this is reported by faking up a 0x4100 exception, eg: =20 Unrecoverable exception 4100 at 0 Oops: Unrecoverable exception, sig: 6 [#1] ... CPU: 0 PID: 1262 Comm: sh Not tainted 4.18.0-rc3-gcc-7.3.1-00098-g7fc=
2229fb2ab-dirty #9
quoted
quoted
NIP: 0000000000000000 LR: c00000000000b9e4 CTR: 00007fff8bb971b0 REGS: c0000000ee02bbb0 TRAP: 4100 ... LR [c00000000000b9e4] system_call+0x5c/0x70 =20 The 0x4100 value was chosen back in 2004 as part of the fix for the "mega bug" - "ppc64: Fix SLB reload bug". Back then it was obvious that 0x4100 was not a real trap value, as the highest actual trap was less than 0x2000. =20 Since then however the architecture has changed and now we have "virtual mode" or "relon" exceptions, in which exceptions can be delivered with the MMU on starting at 0x4000. =20 At a glance 0x4100 looks like a virtual mode 0x100 exception, aka system reset exception. A close reading of the architecture will show that system reset exceptions can't be delivered in virtual mode, and so 0x4100 is not a valid trap number. But that's not immediately obvious. There's also nothing about 0x4100 that suggests SLB miss. =20 So to make things a bit less confusing switch to a fake but unique and hopefully more helpful numbering. For data SLB misses we report a 0x390 trap and for instruction we report 0x490. Compared to 0x380 and 0x480 for the actual data & instruction SLB exceptions. =20 Also add a C handler that prints a more explicit message. The end result is something like: =20 Oops: Unrecoverable SLB miss (MSR[RI]=3D0), sig: 6 [#3]This is all good, but allow me to nitpick. Our unrecoverable exception messages (and other messages, but those) are becoming a bit ad-hoc and messy. It would be nice to go the other way eventually and consolidate them into one. Would be nice to have a common function that takes regs and returns the string of the corresponding exception name that makes these more readable.=20 Yeah that's true, though some of them aren't simply a mapping from the trap number, eg. the kernel bad stack one. =20 But in general our whole oops output, regs, stack trace etc. could use a revamp. =20 I've been thinking of making the trap number more prominent and providing a text description, because apparently not everyone knows the trap numbers by heart :)
Yes please, guilty as charged :) https://patchwork.ozlabs.org/patch/899980/ Thanks, Naveen =