Thread (4 messages) 4 messages, 3 authors, 2018-08-08

Re: [PATCH] powerpc/64s: Make unrecoverable SLB miss less confusing

From: Naveen N. Rao <hidden>
Date: 2018-08-08 14:43:00

Michael Ellerman wrote:
Nicholas Piggin [off-list ref] writes:
quoted
On Thu, 26 Jul 2018 23:01:51 +1000
Michael Ellerman [off-list ref] wrote:
quoted
If we take an SLB miss while MSR[RI]=3D0 we can't recover and have to
oops. Currently this is reported by faking up a 0x4100 exception, eg:
=20
  Unrecoverable exception 4100 at 0
  Oops: Unrecoverable exception, sig: 6 [#1]
  ...
  CPU: 0 PID: 1262 Comm: sh Not tainted 4.18.0-rc3-gcc-7.3.1-00098-g7fc=
2229fb2ab-dirty #9
quoted
quoted
  NIP:  0000000000000000 LR: c00000000000b9e4 CTR: 00007fff8bb971b0
  REGS: c0000000ee02bbb0 TRAP: 4100
  ...
  LR [c00000000000b9e4] system_call+0x5c/0x70
=20
The 0x4100 value was chosen back in 2004 as part of the fix for the
"mega bug" - "ppc64: Fix SLB reload bug". Back then it was obvious
that 0x4100 was not a real trap value, as the highest actual trap was
less than 0x2000.
=20
Since then however the architecture has changed and now we have
"virtual mode" or "relon" exceptions, in which exceptions can be
delivered with the MMU on starting at 0x4000.
=20
At a glance 0x4100 looks like a virtual mode 0x100 exception, aka
system reset exception. A close reading of the architecture will show
that system reset exceptions can't be delivered in virtual mode, and
so 0x4100 is not a valid trap number. But that's not immediately
obvious. There's also nothing about 0x4100 that suggests SLB miss.
=20
So to make things a bit less confusing switch to a fake but unique and
hopefully more helpful numbering. For data SLB misses we report a
0x390 trap and for instruction we report 0x490. Compared to 0x380 and
0x480 for the actual data & instruction SLB exceptions.
=20
Also add a C handler that prints a more explicit message. The end
result is something like:
=20
  Oops: Unrecoverable SLB miss (MSR[RI]=3D0), sig: 6 [#3]
This is all good, but allow me to nitpick. Our unrecoverable
exception messages (and other messages, but those) are becoming a bit
ad-hoc and messy.

It would be nice to go the other way eventually and consolidate them
into one. Would be nice to have a common function that takes regs and
returns the string of the corresponding exception name that makes
these more readable.
=20
Yeah that's true, though some of them aren't simply a mapping from the
trap number, eg. the kernel bad stack one.
=20
But in general our whole oops output, regs, stack trace etc. could use a
revamp.
=20
I've been thinking of making the trap number more prominent and
providing a text description, because apparently not everyone knows the
trap numbers by heart :)
Yes please, guilty as charged :)
https://patchwork.ozlabs.org/patch/899980/

Thanks,
Naveen

=
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help