Re: netconsole: HARDIRQ-safe -> HARDIRQ-unsafe lock order warning
From: Calvin Owens <hidden>
Date: 2025-08-15 19:10:44
Also in:
lkml
On Friday 08/15 at 10:29 -0700, Breno Leitao wrote:
On Fri, Aug 15, 2025 at 09:42:17AM -0700, Jakub Kicinski wrote:quoted
On Fri, 15 Aug 2025 11:44:45 +0100 Pavel Begunkov wrote:quoted
On 8/15/25 01:23, Jakub Kicinski wrote:I suspect disabling netconsole over WiFi may be the most sensible way out.I believe we might be facing a similar issue with virtio-net. Specifically, any network adapter where TX is not safe to use in IRQ context encounters this problem. If we want to keep netconsole enabled on all TX paths, a possible solution is to defer the transmission work when netconsole is called inside an IRQ. The idea is that netconsole first checks if it is running in an IRQ context using in_irq(). If so, it queues the skb without transmitting it immediately and schedules deferred work to handle the transmission later. A rough implementation could be: static void send_udp(struct netconsole_target *nt, const char *msg, int len) { /* get the SKB that is already populated, with all the headers * and ready to be sent */ struct sk_buff = netpoll_get_skb(&nt->np, msg, len); if (in_irq()) { skb_queue_tail(&np->delayed_queue, skb); schedule_delayed_work(flush_delayed_queue, 0); return; } return __netpoll_send_skb(struct netpoll *np, struct sk_buff *skb) } This approach does not require additional memory or extra data copying, since copying from the printk buffer to the skb must be performed regardless. The main drawback is a slight delay for messages sent from within an IRQ context, though I believe such cases are infrequent. We could potentially also perform the flush from softirq context, which would help reduce this latency further.
If we take an OOPS in any IRQ, I suspect that delayed_work will never
get a chance to run, and we'll now lose all such OOPSes over netconsole?
I don't think softirq would get a chance either in that case?
Clearly, if it was a net driver's IRQ, that's not likely to happen
anyway. But in my experience, OOPSes in IRQs other than the driver
underlying netconsole's netdev *do* get emitted pretty reliably.
If your condition instead becomes:
if (in_irq() && !oops_in_progress)
...I think we can have our cake and eat it too? In an OOPS we're
busting locks and such, all bets are off anyway. Although, I suppose
that might still drop messages emitted immediately before it...