Re: a3b2cb30 broken panic reporting for qemu guests

From: David Gibson <hidden>
Date: 2017-12-04 06:31:43

On Mon, Dec 04, 2017 at 04:12:06PM +1000, Nicholas Piggin wrote:

On Mon, 4 Dec 2017 16:49:14 +1100
David Gibson [off-list ref] wrote:

quoted

On Fri, Dec 01, 2017 at 09:40:38PM +1000, Nicholas Piggin wrote:

quoted

On Fri, 01 Dec 2017 22:11:50 +1100
Michael Ellerman [off-list ref] wrote:

quoted

David Gibson [off-list ref] writes:

quoted

On Wed, Nov 29, 2017 at 02:23:43PM +1000, Nicholas Piggin wrote:

quoted

On Wed, 29 Nov 2017 15:06:52 +1100
David Gibson [off-list ref] wrote:

quoted

a3b2cb30 "powerpc: Do not call ppc_md.panic in fadump panic notifier"
purports to fix a problem when the kernel panics with fadump not
registered, but it breaks something else instead.  I _think_ it was
working on the incorrect assumption that ppc_md.panic was (or should
be) only used with fadump, but I'm not really sure.

Panic works with kdump enabled, and (I think) with fadump enabled).
However, with neither of these enabled, we always go to the generic
panic logic.

Yeah thanks, I can't remember what assumption I was working on tbh.

quoted

That's incorrect for PAPR guests - they should call ibm,os-term via
RTAS.  Under qemu this leads to a "GUEST_PANICKED" event notification
which higher-level management pays attention to.  Since a3b2cb30 we
now reboot instead of reporting that.

I believe it will also break panic for PS3 machines, but since that
platform basically no longer exists, we probably don't care.

I (hope) it should just go down to the normal panic path and not do
much worse than it already does -- although it won't print out that
message.

quoted

I'm not entirely sure how to fix this.  I _think_ what we want is to
call ppc_md.panic from a late panic notifier, the way this patch does
for fadump_panic_event() if fadump is registered.

The problem I had there is that some of the printk and console stuff
wasn't getting flushed out, so I was getting a blank screen. This was
probably in conjunction with panicing from NMI context that we're now
starting to introduce.

So it's a bit annoying. There's other ugliness we have for being unable
to control panic code well enough from arch code
(arch/powerpc/platforms/powernv/opal.c)

I guess a really minimal fix is to put an #ifdef powerpc down the bottom
there (/me *cries*).

Um.. right.  I'm not really sure from that how to go forward from
here.  We want to fix this for RHEL7.5, which doesn't give us a lot of
time.

Adding the #ifdef at the bottom of the generic panic code is gross,
but there's already a bunch of that, so maybe adequate until a better
solution can be found?

I think you mean put an #ifdef at the bottom of panic(). If so that
won't work. Our default panic_timeout is 180 so we never get to the
bottom of panic(), we call emergency_restart().

You *could* put an #ifdef powerpc before that, but that's even more
gross because it's in a different place to the sparc/s390 #ifdefs.

I notice we don't implement machine_emergency_restart(), it just
becomes machine_restart(NULL).

So it seems like that's the place we should be hooking,
machine_emergency_restart(). That's what x86 does.

But panic() is not the only caller of emergency_restart(), so it's not
an entirely straight forward conversion.

So I think for 4.15 and 4.14 I'm inclined to revert. Then we can do a
bigger rework for 4.16.

Nick that shouldn't break your original aim too badly I think? ie. worst
case is we panic() but don't see the output if we came from NMI?

If the fix is not pretty obvious, then I guess revert. We actually
do have a bit of a regression though, since we've started marking
system reset interrupts as NMI. Previously a system reset would have
a better chance of printing something there.

So I wonder is an ifdef really all that much worse just because it's not
in the same exact place as the others? We do get bug reports that were
triggered by a system reset from hypervisor console. Hopefully they would
be running with crash dumps usually.

I don't think an ifdef there is really correct though.  Sounds like we
might instead need to move some console flushes before calling of all
the notifiers, so that things like platforms/pseries or pvpanic can
insert non-returning notifiers.

Well, I don't necessarily think that's the right thing to do either.
If we have a crash dump handler, we should get the console memory
and rather do that immediately than try to flush it, no?

I think a helper function that would do all the necessary generic
panic flushing might be better, than handlers can decide for
themselves.

But a pseries rtas hcall at the end of panic should be okay for a
backport, no? Revert would be easy, but if distros and things pick it
up then we might get a bunch of bugs without console data then
just have to backport something anway. That's my worry.

Hrm.  Well, sure, but I see a bunch of possibilities there without a
clear idea of what to do next.  I'm really inclined to revert then
re-fix the original problem once you figure out how.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

Attachments

signature.asc [application/pgp-signature] 833 bytes

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help