Thread (45 messages) 45 messages, 9 authors, 2025-10-01

Re: netconsole: HARDIRQ-safe -> HARDIRQ-unsafe lock order warning

From: Calvin Owens <hidden>
Date: 2025-08-15 19:10:44
Also in: lkml

On Friday 08/15 at 10:29 -0700, Breno Leitao wrote:
On Fri, Aug 15, 2025 at 09:42:17AM -0700, Jakub Kicinski wrote:
quoted
On Fri, 15 Aug 2025 11:44:45 +0100 Pavel Begunkov wrote:
quoted
On 8/15/25 01:23, Jakub Kicinski wrote:
I suspect disabling netconsole over WiFi may be the most sensible way out.
I believe we might be facing a similar issue with virtio-net.
Specifically, any network adapter where TX is not safe to use in IRQ
context encounters this problem.

If we want to keep netconsole enabled on all TX paths, a possible
solution is to defer the transmission work when netconsole is called
inside an IRQ.

The idea is that netconsole first checks if it is running in an IRQ
context using in_irq(). If so, it queues the skb without transmitting it
immediately and schedules deferred work to handle the transmission
later.

A rough implementation could be:

static void send_udp(struct netconsole_target *nt, const char *msg, int len) {

	/* get the SKB that is already populated, with all the headers
	 * and ready to be sent
	 */
	struct sk_buff = netpoll_get_skb(&nt->np, msg, len);

	if (in_irq()) {
		skb_queue_tail(&np->delayed_queue, skb);
		schedule_delayed_work(flush_delayed_queue, 0);
		return;
	}

	return __netpoll_send_skb(struct netpoll *np, struct sk_buff *skb)
}

This approach does not require additional memory or extra data copying,
since copying from the printk buffer to the skb must be performed
regardless.

The main drawback is a slight delay for messages sent from within an IRQ
context, though I believe such cases are infrequent.

We could potentially also perform the flush from softirq context, which
would help reduce this latency further.
If we take an OOPS in any IRQ, I suspect that delayed_work will never
get a chance to run, and we'll now lose all such OOPSes over netconsole?
I don't think softirq would get a chance either in that case?

Clearly, if it was a net driver's IRQ, that's not likely to happen
anyway. But in my experience, OOPSes in IRQs other than the driver
underlying netconsole's netdev *do* get emitted pretty reliably.

If your condition instead becomes:

    if (in_irq() && !oops_in_progress)

...I think we can have our cake and eat it too? In an OOPS we're
busting locks and such, all bets are off anyway. Although, I suppose
that might still drop messages emitted immediately before it...
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help