Re: netconsole: HARDIRQ-safe -> HARDIRQ-unsafe lock order warning

(off-list ancestor, not in this archive)
Re: netconsole: HARDIRQ-safe -> HARDIRQ-unsafe lock order warning · Breno Leitao <leitao@debian.org> · 2025-08-14
Re: netconsole: HARDIRQ-safe -> HARDIRQ-unsafe lock order warning · Pavel Begunkov <asml.silence@gmail.com> · 2025-08-14
Re: netconsole: HARDIRQ-safe -> HARDIRQ-unsafe lock order warning · Jakub Kicinski <kuba@kernel.org> · 2025-08-15
Re: netconsole: HARDIRQ-safe -> HARDIRQ-unsafe lock order warning · Pavel Begunkov <asml.silence@gmail.com> · 2025-08-15
Re: netconsole: HARDIRQ-safe -> HARDIRQ-unsafe lock order warning · Jakub Kicinski <kuba@kernel.org> · 2025-08-15
Re: netconsole: HARDIRQ-safe -> HARDIRQ-unsafe lock order warning · Breno Leitao <leitao@debian.org> · 2025-08-15
Re: netconsole: HARDIRQ-safe -> HARDIRQ-unsafe lock order warning · Jakub Kicinski <kuba@kernel.org> · 2025-08-15
Re: netconsole: HARDIRQ-safe -> HARDIRQ-unsafe lock order warning · Breno Leitao <leitao@debian.org> · 2025-08-18
Re: netconsole: HARDIRQ-safe -> HARDIRQ-unsafe lock order warning · Calvin Owens <hidden> · 2025-08-15
Re: netconsole: HARDIRQ-safe -> HARDIRQ-unsafe lock order warning · Mike Galbraith <hidden> · 2025-08-16
Re: netconsole: HARDIRQ-safe -> HARDIRQ-unsafe lock order warning · Pavel Begunkov <asml.silence@gmail.com> · 2025-08-15
Re: netconsole: HARDIRQ-safe -> HARDIRQ-unsafe lock order warning · Breno Leitao <leitao@debian.org> · 2025-08-18
Re: netconsole: HARDIRQ-safe -> HARDIRQ-unsafe lock order warning · Breno Leitao <leitao@debian.org> · 2025-08-19
Re: netconsole: HARDIRQ-safe -> HARDIRQ-unsafe lock order warning · Mike Galbraith <hidden> · 2025-08-20
Re: netconsole: HARDIRQ-safe -> HARDIRQ-unsafe lock order warning · Breno Leitao <leitao@debian.org> · 2025-08-20
Re: netconsole: HARDIRQ-safe -> HARDIRQ-unsafe lock order warning · Mike Galbraith <hidden> · 2025-08-21
Re: netconsole: HARDIRQ-safe -> HARDIRQ-unsafe lock order warning · Mike Galbraith <hidden> · 2025-08-21
Re: netconsole: HARDIRQ-safe -> HARDIRQ-unsafe lock order warning · Breno Leitao <leitao@debian.org> · 2025-08-21
Re: netconsole: HARDIRQ-safe -> HARDIRQ-unsafe lock order warning · Mike Galbraith <hidden> · 2025-08-22
Re: netconsole: HARDIRQ-safe -> HARDIRQ-unsafe lock order warning · Breno Leitao <leitao@debian.org> · 2025-08-26
Re: netconsole: HARDIRQ-safe -> HARDIRQ-unsafe lock order warning · Mike Galbraith <hidden> · 2025-08-26
Re: netconsole: HARDIRQ-safe -> HARDIRQ-unsafe lock order warning · John Ogness <john.ogness@linutronix.de> · 2025-09-05
Re: netconsole: HARDIRQ-safe -> HARDIRQ-unsafe lock order warning · Mike Galbraith <hidden> · 2025-09-06
Re: netconsole: HARDIRQ-safe -> HARDIRQ-unsafe lock order warning · John Ogness <john.ogness@linutronix.de> · 2025-09-08
Re: netconsole: HARDIRQ-safe -> HARDIRQ-unsafe lock order warning · Mike Galbraith <hidden> · 2025-09-08
Re: netconsole: HARDIRQ-safe -> HARDIRQ-unsafe lock order warning · Calvin Owens <hidden> · 2025-09-08
Re: netconsole: HARDIRQ-safe -> HARDIRQ-unsafe lock order warning · Mike Galbraith <hidden> · 2025-09-09
Re: netconsole: HARDIRQ-safe -> HARDIRQ-unsafe lock order warning · Petr Mladek <pmladek@suse.com> · 2025-09-10
Re: netconsole: HARDIRQ-safe -> HARDIRQ-unsafe lock order warning · Breno Leitao <leitao@debian.org> · 2025-09-09
Re: netconsole: HARDIRQ-safe -> HARDIRQ-unsafe lock order warning · John Ogness <john.ogness@linutronix.de> · 2025-09-10
Re: netconsole: HARDIRQ-safe -> HARDIRQ-unsafe lock order warning · Petr Mladek <pmladek@suse.com> · 2025-09-10
Re: netconsole: HARDIRQ-safe -> HARDIRQ-unsafe lock order warning · Breno Leitao <leitao@debian.org> · 2025-09-10
Re: netconsole: HARDIRQ-safe -> HARDIRQ-unsafe lock order warning · Calvin Owens <hidden> · 2025-09-30
Re: netconsole: HARDIRQ-safe -> HARDIRQ-unsafe lock order warning · John Ogness <john.ogness@linutronix.de> · 2025-09-30
Re: netconsole: HARDIRQ-safe -> HARDIRQ-unsafe lock order warning · Sebastian Siewior <bigeasy@linutronix.de> · 2025-09-30
Re: netconsole: HARDIRQ-safe -> HARDIRQ-unsafe lock order warning · Mike Galbraith <hidden> · 2025-09-30
Re: netconsole: HARDIRQ-safe -> HARDIRQ-unsafe lock order warning · Mike Galbraith <hidden> · 2025-10-01
Re: netconsole: HARDIRQ-safe -> HARDIRQ-unsafe lock order warning · John Ogness <john.ogness@linutronix.de> · 2025-09-11
Re: netconsole: HARDIRQ-safe -> HARDIRQ-unsafe lock order warning · Breno Leitao <leitao@debian.org> · 2025-09-10
Re: netconsole: HARDIRQ-safe -> HARDIRQ-unsafe lock order warning · John Ogness <john.ogness@linutronix.de> · 2025-09-11
Re: netconsole: HARDIRQ-safe -> HARDIRQ-unsafe lock order warning · Mike Galbraith <hidden> · 2025-08-21
Re: netconsole: HARDIRQ-safe -> HARDIRQ-unsafe lock order warning · Mike Galbraith <hidden> · 2025-08-21
Re: netconsole: HARDIRQ-safe -> HARDIRQ-unsafe lock order warning · Calvin Owens <hidden> · 2025-08-15
Re: netconsole: HARDIRQ-safe -> HARDIRQ-unsafe lock order warning · Johannes Berg <johannes@sipsolutions.net> · 2025-08-26
Re: netconsole: HARDIRQ-safe -> HARDIRQ-unsafe lock order warning · Mike Galbraith <hidden> · 2025-08-15

From: John Ogness <john.ogness@linutronix.de>
Date: 2025-09-05 12:48:34
Also in: lkml

Hi,

Sorry for jumping in so late here. I just stumbled upon this thread.

Without understanding the details of the netconsole locking and network
performance, I would like to mention some things about the NBCON
interface to clear up some apparent confusion (most likely due to
insufficient documentation, which I am happy to improve upon).

Comments below...

On 2025-08-26, Breno Leitao [off-list ref] wrote:

On Fri, Aug 22, 2025 at 05:54:28AM +0200, Mike Galbraith wrote:

quoted

On Thu, 2025-08-21 at 10:35 -0700, Breno Leitao wrote:

quoted

On Thu, Aug 21, 2025 at 05:51:59AM +0200, Mike Galbraith wrote:

quoted

--- a/drivers/net/netconsole.c
+++ b/drivers/net/netconsole.c

@@ -1952,12 +1952,12 @@ static void netcon_write_thread(struct c

 static void netconsole_device_lock(struct console *con, unsigned long *flags)
 {
 	/* protects all the targets at the same time */
-	spin_lock_irqsave(&target_list_lock, *flags);
+	spin_lock(&target_list_lock);

I personally think this target_list_lock can be moved to an RCU lock.

If that is doable, then we probably make netconsole_device_lock()
to a simple `rcu_read_lock()`, which would solve this problem as well.

The bigger issue for the nbcon patch would seem to be the seemingly
required .write_atomic leading to landing here with disabled IRQs.

Using spin_lock_irqsave()/spin_unlock_irqrestore() within the
->device_lock() and ->device->unlock() callbacks is fine. Even with
PREEMPT_RT this is fine. If you can use RCU to synchronize the target
list, that is probably a nice optimization, but it is certainly not a
requirement from the NBCON (and PREEMPT_RT/lockdep) perspective.

In this case, instead of transmitting through netpoll directly in the
.write_atomic context, we could queue the messages for later delivery.

The ->write_atomic() callback is intended to perform immediate
transmission. It is called with hardware interrupts disabled and is even
expected to work from NMI context. If you are not able to implement
these requirements, do not implement ->write_atomic(). Implementing some
sort of deferrment mechanism is inappropriate. Such a mechanism already
exists based on ->write_thread().

With the current implementation, this is not straightforward unless we
introduce an additional message copy at the start of .write_atomic.

This is where the interface between netpoll and netconsole becomes
problematic. Ideally, we would avoid carrying extra data into netconsole
and instead copy the message into an SKB and queue the SKB for
transmission.

The core issue is that netpoll and netconsole are tightly coupled, and
several pieces of functionality that live in netpoll really belong in
netconsole. A good example is the SKB pool: that’s a netconsole concept,
not a netpoll one. None of the other netpoll users send raw char *
messages. They all work directly with skbs, so, in order to achieve it,
we need to move the concept of skb pool into netconsole, and give
netconsole the management of the skb pool.

quoted

WRT my patch, seeing a hard RT crash on wired box cleanly logged with
your nbcon patch applied (plus my twiddle mentioned earlier) tells me
my patch has lost its original reason to exist.  It's relevant to this
thread only in that those once thought to be RT specific IRQ disable
spots turned out to actually be RT agnostic wireless sore spots.

Thanks. As a follow-up, I would suggest the following steps:

1) Decouple the SKB pool from netpoll and move it into netconsole

  * This makes netconsole behave like any other netpoll user,
    interacting with netpoll by sending SKBs.
	* The SKB population logic would then reside in netconsole, where it
	  logically belongs.

  * Enable NBCONS in netconsole, guarded by NETCONSOLE_NBCON
	* In normal .write_atomic() mode, messages should be queued in
	  a workqueue.

This is the wrong approach. It cannot be expected that the workqueue is
functional during panic. ->write_atomic() needs to be able to write
directly, most likely using pre-allocated SKBs and pre-setup dedicated
network queues.

As an example, the graphics people implemented the Blue-Screen-Of-Death
by preallocating a separate graphics buffer. In case of a crash, the
hardware simply switches to the "crash buffer", rather than trying to
integrate and take control of the graphics buffer already in use.

However, it is also important to note that the graphics consoles do not
implement the ->write_atomic() callback. Instead they register a
kmsg_dumper to display the panic upon crash. This may also be a good
approach for netconsole if ->write_atomic() callbacks are not available.

	* If oops_in_progress is set, we bypass the queue and
	  transmit the SKB immediately. (Maybe disabling lockdep?!).

NBCON is meant to deprecate @oops_in_progress. However, it is true that
consoles not implementing ->write_atomic() will never print panic
output.

John Ogness

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help