Re: [PATCH] kexec/crash: no crash update when kexec in progress
From: Baoquan He <bhe@redhat.com>
Date: 2024-09-08 10:30:47
Also in:
kexec, lkml
Subsystem:
kdump, the rest · Maintainers:
Andrew Morton, Baoquan He, Mike Rapoport, Pasha Tatashin, Pratyush Yadav, Linus Torvalds
Possibly related (same subject, not in this thread)
- 2024-08-01 · Re: [PATCH] kexec/crash: no crash update when kexec in progress · Sourabh Jain <hidden>
On 09/05/24 at 02:07pm, Sourabh Jain wrote:
Hello Baoquan, On 05/09/24 08:53, Baoquan He wrote:quoted
On 09/04/24 at 02:55pm, Sourabh Jain wrote:quoted
Hello Baoquan, On 30/08/24 16:47, Baoquan He wrote:quoted
On 08/20/24 at 12:10pm, Sourabh Jain wrote:quoted
Hello Baoquan,......snip...quoted
quoted
quoted
2. A patch to return early from the `crash_handle_hotplug_event()` function if `kexec_in_progress` is set to True. This is essentially my original patch.There's a race gap between the kexec_in_progress checking and the setting it to true which Michael has mentioned.The window where kernel is holding kexec_lock to do kexec boot but kexec_in_progress is yet not set to True. If kernel needs to handle crash hotplug event, the function crash_handle_hotplug_event() will not get the kexec_lock and error out by printing error message about not able to update kdump image.But you wanted to avoid the erroring out if it's being in kernel_kexec(). Now you are seeing at least one the noising message, aren't you?Yes, but it is very rare to encounter. My comments on your updated code are inline below.quoted
quoted
I think it should be fine. Given that lock is already taken for kexec kernel boot. Am I missing something major?quoted
That's why I think maybe checking kexec_in_progress after failing to retriving __kexec_lock is a little better, not very sure.Try for kexec lock before kexec_in_progress check will not solve the original problem this patch trying to solve. You proposed the below changes earlier: - if (!kexec_trylock()) { + if (!kexec_trylock() && kexec_in_progress) { pr_info("kexec_trylock() failed, elfcorehdr may be inaccurate\n"); crash_hotplug_unlock();Ah, I meant as below, but wrote it mistakenly.diff --git a/kernel/crash_core.c b/kernel/crash_core.c index 63cf89393c6e..e7c7aa761f46 100644 --- a/kernel/crash_core.c +++ b/kernel/crash_core.c@@ -504,7 +504,7 @@ int crash_check_hotplug_support(void) crash_hotplug_lock(); /* Obtain lock while reading crash information */ - if (!kexec_trylock()) { + if (!kexec_trylock() && !kexec_in_progress) { pr_info("kexec_trylock() failed, elfcorehdr may be inaccurate\n"); crash_hotplug_unlock(); return 0;quoted
Once the kexec_in_progress is set to True there is no way one can get kexec_lock. So kexec_trylock() before kexec_in_progress is not helpful for the problem I am trying to solve.With your patch, you could still get the error message if the race gap exist. With above change, you won't get it. Please correct me if I am wrong.The above code will print an error message during the race gap. Here's why: Let’s say the kexec lock is acquired in the kernel_kexec() function, but kexec_in_progress is not yet set to True. In this scenario, the code will print an error message. There is another issue I see with the above code: Consider that the system is on the kexec kernel boot path, and kexec_in_progress is set to True. If crash_hotplug_unlock() is called, the kernel will not only update the kdump image without acquiring the kexec lock, but it will also release the kexec lock in the out label. I believe this is incorrect. Please share your thoughts.
How about this?
diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index 63cf89393c6e..8ba7b1da0ded 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c@@ -505,7 +505,8 @@ int crash_check_hotplug_support(void) crash_hotplug_lock(); /* Obtain lock while reading crash information */ if (!kexec_trylock()) { - pr_info("kexec_trylock() failed, elfcorehdr may be inaccurate\n"); + if (!kexec_in_progress) + pr_info("kexec_trylock() failed, elfcorehdr may be inaccurate\n"); crash_hotplug_unlock(); return 0; }
@@ -540,7 +541,8 @@ static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int cpu, crash_hotplug_lock(); /* Obtain lock while changing crash information */ if (!kexec_trylock()) { - pr_info("kexec_trylock() failed, elfcorehdr may be inaccurate\n"); + if (!kexec_in_progress) + pr_info("kexec_trylock() failed, elfcorehdr may be inaccurate\n"); crash_hotplug_unlock(); return; }