Re: netconsole: HARDIRQ-safe -> HARDIRQ-unsafe lock order warning
From: Pavel Begunkov <asml.silence@gmail.com>
Date: 2025-08-15 20:01:13
Also in:
lkml
On 8/15/25 18:29, Breno Leitao wrote:
On Fri, Aug 15, 2025 at 09:42:17AM -0700, Jakub Kicinski wrote:quoted
On Fri, 15 Aug 2025 11:44:45 +0100 Pavel Begunkov wrote:quoted
On 8/15/25 01:23, Jakub Kicinski wrote:I suspect disabling netconsole over WiFi may be the most sensible way out.I believe we might be facing a similar issue with virtio-net. Specifically, any network adapter where TX is not safe to use in IRQ context encounters this problem. If we want to keep netconsole enabled on all TX paths, a possible solution is to defer the transmission work when netconsole is called inside an IRQ. The idea is that netconsole first checks if it is running in an IRQ context using in_irq(). If so, it queues the skb without transmitting it immediately and schedules deferred work to handle the transmission later. A rough implementation could be: static void send_udp(struct netconsole_target *nt, const char *msg, int len) { /* get the SKB that is already populated, with all the headers * and ready to be sent */ struct sk_buff = netpoll_get_skb(&nt->np, msg, len); if (in_irq()) {
It's not just irq handlers but any context that has irqs disabled, and since it's nested under irq-safe console_owner it'd need to always be deferred or somehow moved out of the console_owner critical section. Maybe there is printk lock trickery I don't understand, however.
skb_queue_tail(&np->delayed_queue, skb); schedule_delayed_work(flush_delayed_queue, 0); return; } return __netpoll_send_skb(struct netpoll *np, struct sk_buff *skb) } This approach does not require additional memory or extra data copying, since copying from the printk buffer to the skb must be performed regardless. The main drawback is a slight delay for messages sent from within an IRQ context, though I believe such cases are infrequent. We could potentially also perform the flush from softirq context, which would help reduce this latency further.
-- Pavel Begunkov