Thread (5 messages) 5 messages, 3 authors, 2021-01-05

RE: [PATCH v2] x86/hyperv: Fix kexec panic/hang issues

From: Michael Kelley <hidden>
Date: 2021-01-05 16:40:38
Also in: lkml

From: Wei Liu <wei.liu@kernel.org> Sent: Tuesday, January 5, 2021 5:04 AM
On Mon, Dec 21, 2020 at 10:55:41PM -0800, Dexuan Cui wrote:
quoted
Currently the kexec kernel can panic or hang due to 2 causes:

1) hv_cpu_die() is not called upon kexec, so the hypervisor corrupts the
old VP Assist Pages when the kexec kernel runs. The same issue is fixed
for hibernation in commit 421f090c819d ("x86/hyperv: Suspend/resume the
VP assist page for hibernation"). Now fix it for kexec.

2) hyperv_cleanup() is called too early. In the kexec path, the other CPUs
are stopped in hv_machine_shutdown() -> native_machine_shutdown(), so
between hv_kexec_handler() and native_machine_shutdown(), the other CPUs
can still try to access the hypercall page and cause panic. The workaround
"hv_hypercall_pg = NULL;" in hyperv_cleanup() is unreliabe. Move
hyperv_cleanup() to a better place.

Signed-off-by: Dexuan Cui <decui@microsoft.com>
The code looks a bit intrusive. On the other hand, this does sound like
something needs backporting for older stable kernels.

On a more practical note, I need to decide whether to take it via
hyperv-fixes or hyperv-next. What do you think?
I'd like to see this in hyperv-fixes and backported to older stable kernels.
In its current form, the kexec path in a Hyper-V guest has multiple problems
that make it unreliable, so the downside risk of taking these fixes is minimal
while the upside benefit is considerable.

Michael
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help