Re: [PATCH v3 4/4] panic: use sys_info_with_filter() to avoid duplicate backtraces
From: Bradley Morgan <hidden>
Date: 2026-06-26 14:35:28
Also in:
lkml, stable
On June 26, 2026 3:26:11 PM GMT+01:00, Petr Mladek [off-list ref] wrote:
On Fri 2026-06-26 13:32:38, Bradley Morgan wrote:quoted
On June 26, 2026 1:17:13 PM GMT+01:00, Bradley Morgan[off-list ref]quoted
wrote:quoted
On June 26, 2026 1:14:14 PM GMT+01:00, Petr Mladek [off-list ref] wrote:quoted
On Fri 2026-06-26 12:23:50, Petr Mladek wrote:quoted
On Thu 2026-06-25 15:25:58, Bradley Morgan wrote: But it all becomes very hairy. We have several levels: + watchdog-all_bt-specific option, e.g.sysctl_hardlockup_all_cpu_backtracequoted
+ watchdog-specific si_info preferences, e.g. hardlockup_si_mask + panic-specific si_info: panic_print + universal fallback for any layer: kernel_si_info Now, we try to check all these variables back and forth to trigger all backtraces or to avoid triggering them. And it clearly does not work well and the code is more and more hairy. I think about another approach. The word "waterfall" comes to mymind.quoted
quoted
quoted
quoted
Instead of checking all the settings back and forth, let's process each setting one by one and just remember what has been done and skip this in the next level. All the si_info actions seems to dump a global system state. So, it would make sense to remember the state in a global variable even when it might be modified by more CPUs in parallel.Hmm.. new idea kernel/dump_filter.c ? What this file could do is to handle a generic lockup state machine so any subsystem can log what it already dumped? I know it may bloat, but it's better then cramming fixes in.I am not sure what exactly you would like to achieve but it sounds a bit scary ;-) Anyway, we should not synchronize the watchdog reports against each other, definitely. They are running in non-compatible contexts (task vs interrupt vs NMI). Also we should not add any locking because they usually print something when the system has enough troubles. Also I think that it is not worth preventing duplicated backtraces or reports from a single CPU. IMHO, it is not a big problem in practice. So, we are down to large reports, like backtraces from all CPUs, timers, locks, ... which are handled by sys_info(). So, I think that it should be enough to handle this inside the sys_info() API. I do not want to say that my proposal was the best solution. I am sure that there are better ones. But we need to consider the gain vs. complexity. Honestly, I am already a bit scared by the complexity which we the sys_info() API added. And it is hard to imagine that adding another API would make it easier. But I might be wrong. Instead, it might make sense to integrate the conflicting subsystem-specific calls under the sys_info() API. I mean that, for example watchdog_hardlockup_check() won't call trigger_allbutcpu_cpu_backtrace() directly but it would call it via sys_info() API so that sys_info() could keep track of it. Something like: void sys_info_allbutcpu_bt(int cpu) { trigger_allbutcpu_cpu_backtrace(cpu); /* * The caller likely printed backtrace of the given @cpu * on its own. Prevent duplicate backtraces from all * CPUs with potential next sys_info() call. */ sys_info_done(SYS_INFO_ALL_BT); } But I am not sure if it is really easier to follow than calling sys_info_done() from the watchdog code. Some watchdogs try to optimize the output and print backtraces only from CPUs which are relevant for the given lockup. We should keep the logic for selecting the set of CPUs in the watchdog code. We just need to solve how to elegantly make sys_info() aware of it or at least about the more massive reports. Anyway, I would prefer to keep it simple until we see some problems in practice. Best Regards, Petr
I understand it's scary. To make a new file in the first place. But I was a bit vague of what I wanted, and I'm sorry. So, the reason why I'd suggest a new file, is because if any subsystem Theoretically bypasses sys_info to log a lockup, this completely misses the filter and duplicates the dump My file would act as a generic lockless state machine that any subsystem can update regardless of how they dump logs. If you have any questions, feel absolutely free to ask! :) Discussion is a way to make everyone happy! Thanks!