Re: [PATCH v14 net-next 08/23] net/tcp: Add AO sign to RST packets
From: Dmitry Safonov <hidden>
Date: 2023-10-18 15:45:04
Also in:
lkml, oe-lkp
Hi Oliver, On 10/18/23 08:32, Oliver Sang wrote:
hi, Dmitry Safonov, On Wed, Oct 18, 2023 at 12:19:56AM +0100, Dmitry Safonov wrote:quoted
On 10/17/23 09:37, kernel test robot wrote:quoted
Hello, kernel test robot noticed "BUG:sleeping_function_called_from_invalid_context_at_net/core/sock.c" on: commit: df13d11e6a2a3cc5f973aca36f68f880fa42d55f ("[PATCH v14 net-next 08/23] net/tcp: Add AO sign to RST packets")
[..]
quoted
quoted
[ 221.348247][ T7133] BUG: sleeping function called from invalid context at net/core/sock.c:2978 [ 221.349875][ T7133] in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 7133, name: trinity-c4 [ 221.351666][ T7133] preempt_count: 0, expected: 0 [ 221.352614][ T7133] RCU nest depth: 1, expected: 0 [ 221.353518][ T7133] 2 locks held by trinity-c4/7133: [ 221.354530][ T7133] #0: ed8b5660 (sk_lock-AF_INET6){+.+.}-{0:0}, at: tcp_sendmsg (net/ipv4/tcp.c:1336) [ 221.374314][ T7133] #1: c27dbb18 (rcu_read_lock){....}-{1:2}, at: inet6_csk_xmit (include/linux/rcupdate.h:747 net/ipv6/inet6_connection_sock.c:129) [ 221.375906][ T7133] CPU: 1 PID: 7133 Comm: trinity-c4 Tainted: G W TN 6.6.0-rc4-01105-gdf13d11e6a2a #1Quite puzzling. It doesn't seem that it can happen to inet6_csk_xmit(): : rcu_read_lock(); : skb_dst_set_noref(skb, dst); : : /* Restore final destination back after routing done */ : fl6.daddr = sk->sk_v6_daddr; : : res = ip6_xmit(sk, skb, &fl6, sk->sk_mark, rcu_dereference(np->opt), : np->tclass, sk->sk_priority); : rcu_read_unlock(); So, I presumed the calltrace was for nested rcu_read_lock() case. Then I've looked at all return/goto cases, I couldn't find any unbalanced rcu_read_{,un}lock(). Is this reproducible by any chance?do you mean how often it could be reproduced? we run the tests upon this commit and parent both up to 15 times. from below, parent shows quite clean, but there are various issues while running tests upon this patch.
Thanks a lot for running your tests over it! That's what a fresh brain does: I see that tcp_v6_send_reset() with neither CONFIG_TCP_MD5SIG nor CONFIG_TCP_AO set does grab rcu_read_lock() [as I wasn't sure at that moment that nothing but md5 key does rely on that being held], but the exit is covered with ifdefs: : #if defined(CONFIG_TCP_MD5SIG) || defined(CONFIG_TCP_AO) : out: : if (allocated_traffic_key) : kfree(key.traffic_key); : rcu_read_unlock(); : #endif Seems to have been correct versions ago (version 6), but on refactoring when I've moved ifdeffs to tcp_parse_auth_options() declaration to pollute less code - missed this. Thanks again for your testing and reporting!
8468a6f4f3143 net/tcp: Add tcp_parse_auth_options()
df13d11e6a2a3 net/tcp: Add AO sign to RST packets
8468a6f4f3143ba2 df13d11e6a2a3cc5f973aca36f6
---------------- ---------------------------
fail:runs %reproduction fail:runs
| | |
:15 87% 13:15 dmesg.BUG:sleeping_function_called_from_invalid_context_at_arch/x86/entry/common.c
:15 33% 5:15 dmesg.BUG:sleeping_function_called_from_invalid_context_at_fs/dcache.c
:15 13% 2:15 dmesg.BUG:sleeping_function_called_from_invalid_context_at_include/linux/percpu-rwsem.h
:15 60% 9:15 dmesg.BUG:sleeping_function_called_from_invalid_context_at_include/linux/sched/mm.h
:15 7% 1:15 dmesg.BUG:sleeping_function_called_from_invalid_context_at_kernel/locking/mutex.c
:15 7% 1:15 dmesg.BUG:sleeping_function_called_from_invalid_context_at_kernel/locking/rwsem.c
:15 7% 1:15 dmesg.BUG:sleeping_function_called_from_invalid_context_at_kernel/sched/completion.c
:15 40% 6:15 dmesg.BUG:sleeping_function_called_from_invalid_context_at_kernel/task_work.c
:15 33% 5:15 dmesg.BUG:sleeping_function_called_from_invalid_context_at_lib/iov_iter.c
:15 33% 5:15 dmesg.BUG:sleeping_function_called_from_invalid_context_at_lib/strncpy_from_user.c
:15 20% 3:15 dmesg.BUG:sleeping_function_called_from_invalid_context_at_lib/usercopy.c
:15 13% 2:15 dmesg.BUG:sleeping_function_called_from_invalid_context_at_mm/vmscan.c
:15 100% 15:15 dmesg.BUG:sleeping_function_called_from_invalid_context_at_net/core/sock.c
:15 100% 15:15 dmesg.EIP:rcu_note_context_switch
:15 100% 15:15 dmesg.WARNING:at_kernel/rcu/tree_plugin.h:#rcu_note_context_switch
:15 100% 15:15 dmesg.WARNING:lock_held_when_returning_to_user_space
:15 100% 15:15 dmesg.is_leaving_the_kernel_with_locks_still_held[..]
Thanks,
Dmitry