Thread (34 messages) 34 messages, 11 authors, 2012-08-25

Re: [PATCH v8] kvm: notify host when the guest is panicked\

From: Marcelo Tosatti <hidden>
Date: 2012-08-15 01:41:46
Also in: lkml, qemu-devel

On Tue, Aug 14, 2012 at 05:59:06PM -0500, Anthony Liguori wrote:
Marcelo Tosatti [off-list ref] writes:
quoted
On Tue, Aug 14, 2012 at 02:35:34PM -0500, Anthony Liguori wrote:
quoted
Marcelo Tosatti [off-list ref] writes:
quoted
On Tue, Aug 14, 2012 at 01:53:01PM -0500, Anthony Liguori wrote:
quoted
Marcelo Tosatti [off-list ref] writes:
quoted
On Tue, Aug 14, 2012 at 05:55:54PM +0300, Yan Vugenfirer wrote:
quoted
On Aug 14, 2012, at 1:42 PM, Jan Kiszka wrote:
quoted
On 2012-08-14 10:56, Daniel P. Berrange wrote:
quoted
On Mon, Aug 13, 2012 at 03:21:32PM -0300, Marcelo Tosatti wrote:
quoted
On Wed, Aug 08, 2012 at 10:43:01AM +0800, Wen Congyang wrote:
quoted
We can know the guest is panicked when the guest runs on xen.
But we do not have such feature on kvm.

Another purpose of this feature is: management app(for example:
libvirt) can do auto dump when the guest is panicked. If management
app does not do auto dump, the guest's user can do dump by hand if
he sees the guest is panicked.

We have three solutions to implement this feature:
1. use vmcall
2. use I/O port
3. use virtio-serial.

We have decided to avoid touching hypervisor. The reason why I choose
choose the I/O port is:
1. it is easier to implememt
2. it does not depend any virtual device
3. it can work when starting the kernel
How about searching for the "Kernel panic - not syncing" string 
in the guests serial output? Say libvirtd could take an action upon
that?
No, this is not satisfactory. It depends on the guest OS being
configured to use the serial port for console output which we
cannot mandate, since it may well be required for other purposes.
Please don't forget Windows guests, there is no console and no "Kernel Panic" string ;)

What I used for debugging purposes on Windows guest is to register a bugcheck callback in virtio-net driver and write 1 to VIRTIO_PCI_ISR register.

Yan. 
Considering whether a "panic-device" should cover other OSes is also \
quoted
quoted
something to consider. Even for Linux, is "panic" the only case which
should be reported via the mechanism? What about oopses without panic? 

Is the mechanism general enough for supporting new events, etc.
Hi,

I think this discussion is gone of the deep end.

Forget about !x86 platforms.  They have their own way to do this sort of
thing.  
The panic function in kernel/panic.c has the following options, which
appear to be arch independent, on panic:

- reboot 
- blink
Not sure the semantics of blink but that might be a good place for a
pvops hook.
quoted
None are paravirtual interfaces however.
quoted
Think of this feature like a status LED on a motherboard.  These
are very common and usually controlled by IO ports.

We're simply reserving a "status LED" for the guest to indicate that it
has paniced.  Let's not over engineer this.
My concern is that you end up with state that is dependant on x86.

Subject: [PATCH v8 3/6] add a new runstate: RUN_STATE_GUEST_PANICKED

Having the ability to stop/restart the guest (and even introducing a 
new VM runstate) is more than a status LED analogy.
I must admit, I don't know why a new runstate is necessary/useful.  The
kernel shouldn't have to care about the difference between a halted guest
and a panicked guest.  That level of information belongs in userspace IMHO.
quoted
Can this new infrastructure be used by other architectures?
I guess I don't understand why the kernel side of this isn't anything
more than a paravirt op hook that does a single outb() with the
remaining logic handled 100% in QEMU.
From the patch description:

"Another purpose of this feature is: management app(for example:
libvirt) can do auto dump when the guest is panicked. If management
app does not do auto dump, the guest's user can do dump by hand if
he sees the guest is panicked."
Why does this mandated another runstate?  
Good question.
QEMU can simply mark the VCPUs as stopped and raise a QMP event.
Yes. As long as management app is able to find out for what the reason
the VM has been stopped (that is, its not an issue to lose the QMP
event).
The kernel doesn't care if the VCPUs
are stopped or panicked.
quoted
Wen, auto dump means dump of guest memory?

In that case, the notification should obviously stop the guest 
otherwise the guest might be reset by the time memdump from QEMU 
monitor runs.

But kexec supports dumping of memory already (i suppose it can 
do automatic dump+{reboot,shutdown}).
quoted
quoted
Do you consider allowing support for Windows as overengineering?
I don't think there is a way to hook BSOD on Windows so attempting to
engineer something that works with Windows seems odd, no?
Unsure about hooking at BSOD time. But Windows has configurable 
memory dump/reset/reboot, so yes it should not necessary.
Do you mean it's not necessary to hook BSOD?
If all you need is dumping memory and rebooting the guest, then Windows
can do that automatically. Linux probably does, if not its possible 
to make it do so.
I've very often gotten asked: We know 1 person is experiencing this
crash condition, can we figure out from the host how many other VMs are
experiencing this crash too instead of waiting for a user to complain?

That's the primary use-case for this notification IMHO.  Just a simple
status LED from the guest to indicate that it's in a bad state.
That makes sense. But it appears to me that using an interface which is
not specific to x86 is interesting, so as to not require another
driver and matching QEMU code for other architectures. That is, 
for the "paravirtual status-LED-on-panic", there is no advantage 
in making every architecture different.

Also configuration of reboot-on-panic should override
panic-via-hypervisor (guest settings have priority over
panic-via-hypervisor).

For the usecase above (recording a critical event), it also makes sense
to support Windows.

Regards,

Anthony Liguori
quoted
quoted
Regards,

Anthony Liguori
quoted
quoted
Regards,

Anthony Liguori
quoted
quoted
quoted
Well, we have more than a single serial port, even when leaving
virtio-serial aside...

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help