Thread (45 messages) 45 messages, 9 authors, 2025-10-01

Re: netconsole: HARDIRQ-safe -> HARDIRQ-unsafe lock order warning

From: Pavel Begunkov <asml.silence@gmail.com>
Date: 2025-08-15 10:43:29
Also in: lkml

On 8/15/25 01:23, Jakub Kicinski wrote:
On Thu, 14 Aug 2025 03:16:11 -0700 Breno Leitao wrote:
quoted
  2.2) netpoll 				// net poll will call the network subsystem to send the packet
  2.3) lock(&fq->lock);			// Try to get the lock while the lock was already held
The report for reference:

https://lore.kernel.org/all/fb38cfe5153fd67f540e6e8aff814c60b7129480.camel@gmx.de/ (local)> 
Where does netpoll take fq->lock ?
the dependencies between the lock to be acquired
[  107.985514]  and HARDIRQ-irq-unsafe lock:
[  107.985531] -> (&fq->lock){+.-.}-{3:3} {
...
[  107.988053]  ... acquired at:
[  107.988054]    check_prev_add+0xfb/0xca0
[  107.988058]    validate_chain+0x48c/0x530
[  107.988061]    __lock_acquire+0x550/0xbc0
[  107.988064]    lock_acquire.part.0+0xa1/0x210
[  107.988068]    _raw_spin_lock_bh+0x38/0x50
[  107.988070]    ieee80211_queue_skb+0xfd/0x350 [mac80211]
[  107.988198]    __ieee80211_xmit_fast+0x202/0x360 [mac80211]
[  107.988314]    ieee80211_xmit_fast+0xfb/0x1f0 [mac80211]
[  107.988424]    __ieee80211_subif_start_xmit+0x14e/0x3d0 [mac80211]
[  107.988530]    ieee80211_subif_start_xmit+0x46/0x230 [mac80211]
[  107.988634]    netpoll_start_xmit+0x8b/0xd0
[  107.988638]    __netpoll_send_skb+0x329/0x3b0
[  107.988641]    write_msg+0x104/0x120 [netconsole]
[  107.988647]    console_emit_next_record+0x203/0x250
[  107.988652]    console_flush_all+0x24d/0x370
[  107.988657]    console_unlock+0x66/0x130
[  107.988662]    vprintk_emit+0x142/0x360
[  107.988666]    _printk+0x5b/0x80
[  107.988671]    enabled_store.cold+0x7e/0x83 [netconsole]
[  107.988677]    configfs_write_iter+0xbd/0x120 [configfs]
[  107.988683]    vfs_write+0x213/0x520
[  107.988689]    ksys_write+0x69/0xe0
[  107.988691]    do_syscall_64+0x94/0xa10
[  107.988695]    entry_SYSCALL_64_after_hwframe+0x4b/0x53
We started hitting this a lot in the CI as well, lockdep must have
gotten more sensitive in 6.17. Last I checked lockdep didn't understand
FWIW, I remember there were similar reports last year but with
xmit lock.
that we manually test for nesting with netif_local_xmit_active().
Looks like Breno tried to simplify it, the original syz report
gave the following scenario:

[  107.984942] Chain exists of:
                  console_owner --> target_list_lock --> &fq->lock

[  107.984947]  Possible interrupt unsafe locking scenario:
[  107.984948]        CPU0                    CPU1
[  107.984949]        ----                    ----
[  107.984950]   lock(&fq->lock);
[  107.984952]                                local_irq_disable();
[  107.984952]                                lock(console_owner);
[  107.984954]                                lock(target_list_lock);
[  107.984956]   <Interrupt>
[  107.984957]     lock(console_owner);


Seems like with the fq->lock trace I pasted above we can get sth like:

         CPU0                    CPU1
         ----                    ----
    lock(&fq->lock);
                                 local_irq_disable();
                                 lock(console_owner);
                                 lock(target_list_lock);
                                 lock(&fq->lock);
    <Interrupt>
      lock(console_owner);

Nesting checks won't help with this one.

-- 
Pavel Begunkov
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help