Re: [PATCH] kexec/crash: no crash update when kexec in progress
From: Sourabh Jain <hidden>
Date: 2024-09-05 08:38:31
Also in:
kexec, lkml
Possibly related (same subject, not in this thread)
- 2024-08-01 · Re: [PATCH] kexec/crash: no crash update when kexec in progress · Sourabh Jain <hidden>
Hello Baoquan, On 05/09/24 08:53, Baoquan He wrote:
On 09/04/24 at 02:55pm, Sourabh Jain wrote:quoted
Hello Baoquan, On 30/08/24 16:47, Baoquan He wrote:quoted
On 08/20/24 at 12:10pm, Sourabh Jain wrote:quoted
Hello Baoquan,......snip...quoted
quoted
quoted
2. A patch to return early from the `crash_handle_hotplug_event()` function if `kexec_in_progress` is set to True. This is essentially my original patch.There's a race gap between the kexec_in_progress checking and the setting it to true which Michael has mentioned.The window where kernel is holding kexec_lock to do kexec boot but kexec_in_progress is yet not set to True. If kernel needs to handle crash hotplug event, the function crash_handle_hotplug_event() will not get the kexec_lock and error out by printing error message about not able to update kdump image.But you wanted to avoid the erroring out if it's being in kernel_kexec(). Now you are seeing at least one the noising message, aren't you?
Yes, but it is very rare to encounter. My comments on your updated code are inline below.
quoted hunk ↗ jump to hunk
quoted
I think it should be fine. Given that lock is already taken for kexec kernel boot. Am I missing something major?quoted
That's why I think maybe checking kexec_in_progress after failing to retriving __kexec_lock is a little better, not very sure.Try for kexec lock before kexec_in_progress check will not solve the original problem this patch trying to solve. You proposed the below changes earlier: - if (!kexec_trylock()) { + if (!kexec_trylock() && kexec_in_progress) { pr_info("kexec_trylock() failed, elfcorehdr may be inaccurate\n"); crash_hotplug_unlock();Ah, I meant as below, but wrote it mistakenly.diff --git a/kernel/crash_core.c b/kernel/crash_core.c index 63cf89393c6e..e7c7aa761f46 100644 --- a/kernel/crash_core.c +++ b/kernel/crash_core.c@@ -504,7 +504,7 @@ int crash_check_hotplug_support(void) crash_hotplug_lock(); /* Obtain lock while reading crash information */ - if (!kexec_trylock()) { + if (!kexec_trylock() && !kexec_in_progress) { pr_info("kexec_trylock() failed, elfcorehdr may be inaccurate\n"); crash_hotplug_unlock(); return 0;quoted
Once the kexec_in_progress is set to True there is no way one can get kexec_lock. So kexec_trylock() before kexec_in_progress is not helpful for the problem I am trying to solve.With your patch, you could still get the error message if the race gap exist. With above change, you won't get it. Please correct me if I am wrong.
The above code will print an error message during the race gap. Here's why: Let’s say the kexec lock is acquired in the kernel_kexec() function, but kexec_in_progress is not yet set to True. In this scenario, the code will print an error message. There is another issue I see with the above code: Consider that the system is on the kexec kernel boot path, and kexec_in_progress is set to True. If crash_hotplug_unlock() is called, the kernel will not only update the kdump image without acquiring the kexec lock, but it will also release the kexec lock in the out label. I believe this is incorrect. Please share your thoughts. Thanks, Sourabh Jain