Re: [PATCH net] net/sched: act_mirred: use the backlog for mirred ingress
From: Cong Wang <hidden>
Date: 2022-09-25 18:09:05
On Fri, Sep 23, 2022 at 05:11:12PM +0200, Davide Caratti wrote:
William reports kernel soft-lockups on some OVS topologies when TC mirred
"egress-to-ingress" action is hit by local TCP traffic. Indeed, using the
mirred action in egress-to-ingress can easily produce a dmesg splat like:
============================================
WARNING: possible recursive locking detected
6.0.0-rc4+ #511 Not tainted
--------------------------------------------
nc/1037 is trying to acquire lock:
ffff950687843cb0 (slock-AF_INET/1){+.-.}-{2:2}, at: tcp_v4_rcv+0x1023/0x1160
but task is already holding lock:
ffff950687846cb0 (slock-AF_INET/1){+.-.}-{2:2}, at: tcp_v4_rcv+0x1023/0x1160
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(slock-AF_INET/1);
lock(slock-AF_INET/1);
*** DEADLOCK ***
May be due to missing lock nesting notation
12 locks held by nc/1037:
#0: ffff950687843d40 (sk_lock-AF_INET){+.+.}-{0:0}, at: tcp_sendmsg+0x19/0x40
#1: ffffffff9be07320 (rcu_read_lock){....}-{1:2}, at: __ip_queue_xmit+0x5/0x610
#2: ffffffff9be072e0 (rcu_read_lock_bh){....}-{1:2}, at: ip_finish_output2+0xaa/0xa10
#3: ffffffff9be072e0 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x72/0x11b0
#4: ffffffff9be07320 (rcu_read_lock){....}-{1:2}, at: netif_receive_skb+0x181/0x400
#5: ffffffff9be07320 (rcu_read_lock){....}-{1:2}, at: ip_local_deliver_finish+0x54/0x160
#6: ffff950687846cb0 (slock-AF_INET/1){+.-.}-{2:2}, at: tcp_v4_rcv+0x1023/0x1160
#7: ffffffff9be07320 (rcu_read_lock){....}-{1:2}, at: __ip_queue_xmit+0x5/0x610
#8: ffffffff9be072e0 (rcu_read_lock_bh){....}-{1:2}, at: ip_finish_output2+0xaa/0xa10
#9: ffffffff9be072e0 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x72/0x11b0
#10: ffffffff9be07320 (rcu_read_lock){....}-{1:2}, at: netif_receive_skb+0x181/0x400
#11: ffffffff9be07320 (rcu_read_lock){....}-{1:2}, at: ip_local_deliver_finish+0x54/0x160
stack backtrace:
CPU: 1 PID: 1037 Comm: nc Not tainted 6.0.0-rc4+ #511
Hardware name: Red Hat KVM, BIOS 1.13.0-2.module+el8.3.0+7353+9de0a3cc 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x44/0x5b
__lock_acquire.cold.76+0x121/0x2a7
lock_acquire+0xd5/0x310
_raw_spin_lock_nested+0x39/0x70
tcp_v4_rcv+0x1023/0x1160
ip_protocol_deliver_rcu+0x4d/0x280
ip_local_deliver_finish+0xac/0x160
ip_local_deliver+0x71/0x220
ip_rcv+0x5a/0x200
__netif_receive_skb_one_core+0x89/0xa0
netif_receive_skb+0x1c1/0x400
tcf_mirred_act+0x2a5/0x610 [act_mirred]
tcf_action_exec+0xb3/0x210
fl_classify+0x1f7/0x240 [cls_flower]
tcf_classify+0x7b/0x320
__dev_queue_xmit+0x3a4/0x11b0
ip_finish_output2+0x3b8/0xa10
ip_output+0x7f/0x260
__ip_queue_xmit+0x1ce/0x610
__tcp_transmit_skb+0xabc/0xc80
tcp_rcv_state_process+0x669/0x1290
tcp_v4_do_rcv+0xd7/0x370
tcp_v4_rcv+0x10bc/0x1160
ip_protocol_deliver_rcu+0x4d/0x280
ip_local_deliver_finish+0xac/0x160
ip_local_deliver+0x71/0x220
ip_rcv+0x5a/0x200
__netif_receive_skb_one_core+0x89/0xa0
netif_receive_skb+0x1c1/0x400
tcf_mirred_act+0x2a5/0x610 [act_mirred]
tcf_action_exec+0xb3/0x210
fl_classify+0x1f7/0x240 [cls_flower]
tcf_classify+0x7b/0x320
__dev_queue_xmit+0x3a4/0x11b0
ip_finish_output2+0x3b8/0xa10
ip_output+0x7f/0x260
__ip_queue_xmit+0x1ce/0x610
__tcp_transmit_skb+0xabc/0xc80
tcp_write_xmit+0x229/0x12c0
__tcp_push_pending_frames+0x32/0xf0
tcp_sendmsg_locked+0x297/0xe10
tcp_sendmsg+0x27/0x40
sock_sendmsg+0x58/0x70
__sys_sendto+0xfd/0x170
__x64_sys_sendto+0x24/0x30
do_syscall_64+0x3a/0x90
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f11a06fd281
Code: 00 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 e5 43 2c 00 41 89 ca 8b 00 85 c0 75 1c 45 31 c9 45 31 c0 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 67 c3 66 0f 1f 44 00 00 41 56 41 89 ce 41 55
RSP: 002b:00007ffd17958358 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 0000555c6e671610 RCX: 00007f11a06fd281
RDX: 0000000000002000 RSI: 0000555c6e73a9f0 RDI: 0000000000000003
RBP: 0000555c6e6433b0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000002000
R13: 0000555c6e671410 R14: 0000555c6e671410 R15: 0000555c6e6433f8
</TASK>
that is very similar to those observed by William in his setup.
By using netif_rx() for mirred ingress packets, packets are queued in the
backlog, like it's done in the receive path of "loopback" and "veth", and
the deadlock is not visible anymore. Also add a selftest that can be used
to reproduce the problem / verify the fix.Which also means we can no longer know the RX path status any more, right? I mean if we have filters on ingress, we can't know whether they drop this packet or not, after this patch? To me, this at least breaks users' expectation. BTW, have you thought about solving the above lockdep warning in TCP layer? Thanks.