Re: netconsole: HARDIRQ-safe -> HARDIRQ-unsafe lock order warning
From: Breno Leitao <leitao@debian.org>
Date: 2025-09-10 18:23:13
Also in:
lkml
On Wed, Sep 10, 2025 at 02:28:40PM +0206, John Ogness wrote:
On 2025-09-09, Breno Leitao [off-list ref] wrote:quoted
b) Send the message anyway (and hope for the best) Cons: Netpoll will continue to call IRQ unsafe locks from IRQ safe context (lockdep will continue to be unhappy) Pro: This is how it works today already, so, it is not making the problem worse. In fact, it is narrowing the problem to only .write_atomic().Two concerns here: 1. ->write_atomic() is also used during normal operation 2. It is expected that ->write_atomic() callbacks are implemented safely. The other nbcon citizens are doing this. Having an nbcon driver with an unsafe ->write_atomic() puts all nbcon drivers at risk of not functioning during panic. This could be combined with (a) so that ->write_atomic() implements its own deferred queue of messages to print and only when @legacy_allow_panic_sync is true, will it try to send immediately and hope for the best. @legacy_allow_panic_sync is set after all nbcon drivers have had a chance to flush their buffers safely and then the kernel starts to allow less safe drivers to flush. Although I would prefer the NBCON_ATOMIC_UNSAFE approach instead.
Agree. That seems a more straight forward solution for drivers, and it is clearly a solution that would help netconsole case.
quoted
c) Not implementing .write_atomic Cons: we lose the most important messages of the boot. Any other option I am not seeing?d) Not implementing ->write_atomic() and instead implement a kmsg_dumper for netconsole. This registers a callback that is called during panic. Con: The kmsg_dumper interface has nothing to do with consoles, so it would require some effort coordinating with the console drivers.
I am looking at kmsg_dumper interface, and it doesn't have the buffers that need to be dumper. So, if I understand corect, my kmsg_dumper callback needs to handle loop into the messages buffer and print the remaining messages, right? In other words, do I need to track what messages were sent in netconsole, and then iterate in the kmsgs buffer to find messages that hasn't been sent, and send from there?
Pro: There is absolute freedom for the dumper to implement its own
panic-only solution to get messages out.What about calls to .write_atomic() calls that are not called during panic? Will those be lost in this approach?
e) Involve support from the underlying network drivers to implement true
atomic sending. Thomas Gleixner talked [0] very briefly about how
this could be implemented for netconsole during the 2022
proof-of-concept presentation of the nbcon API.
Cons: It most likely requires new API callbacks for the network
drivers to implement hardware-specific solutions. Many (most?)
drivers would not be able to support it.
Pro: True reliable atomic printing via network.That would make more sense, but, it seems deciding about it is above my pay grade. :-) Thanks for helping us with this issue, --breno