Re: net: hang in unregister_netdevice: waiting for lo to become free
From: Neil Horman <nhorman@tuxdriver.com>
Date: 2018-02-20 16:27:20
Also in:
linux-sctp, lkml
On Tue, Feb 20, 2018 at 09:14:41AM +0100, Dmitry Vyukov wrote:
On Tue, Feb 20, 2018 at 8:56 AM, Tommi Rantala [off-list ref] wrote:quoted
On 19.02.2018 20:59, Dmitry Vyukov wrote:quoted
On Sat, Feb 3, 2018 at 1:15 PM, Xin Long [off-list ref] wrote:quoted
quoted
quoted
On 1/30/18 1:57 PM, David Ahern wrote:quoted
On 1/30/18 1:08 PM, Daniel Borkmann wrote:quoted
On 01/30/2018 07:32 PM, Cong Wang wrote:quoted
On Tue, Jan 30, 2018 at 4:09 AM, Dmitry Vyukov [off-list ref] wrote:quoted
Hello, The following program creates a hang in unregister_netdevice. cleanup_net work hangs there forever periodically printing "unregister_netdevice: waiting for lo to become free. Usage count = 3" and creation of any new network namespaces hangs forever.Interestingly, this is not reproducible on net-next.The most recent change on netns refcnt was 4ee806d51176 ("net: tcp: close sock if net namespace is exiting") in net/net-next from 5 days ago, maybe fixed due to that?This appears to be the commit introducing the refcnt leak: $ git bisect bad dbc2b5e9a09e9a6664679a667ff81cff6e5f2641 is the first bad commit commit dbc2b5e9a09e9a6664679a667ff81cff6e5f2641 Author: Xin Long [off-list ref] Date: Fri May 12 14:39:52 2017 +0800 sctp: fix src address selection if using secondary addresses for ipv6 v4.14 is bad. Running bisect in the background while doing other things....Interesting. The commit that avoids the refcnt leak is commit 955ec4cb3b54c7c389a9f830be7d3ae2056b9212 Author: David Ahern [off-list ref] Date: Wed Jan 24 19:45:29 2018 -0800 net/ipv6: Do not allow route add with a device that is down That commit does not intentionally address the problem so it is just masking the problematic code introduced by the commit above.Thanks, David A. I'm still on a trip. will look into this asap.Alexey and Tommi already had the patches for this issue on both SCTP v4 and v6 dst_get, Thanks.Is this meant to be fixed already? I am still seeing this on the latest upstream tree.These two commits are in v4.16-rc1: commit 4a31a6b19f9ddf498c81f5c9b089742b7472a6f8 Author: Tommi Rantala [off-list ref] Date: Mon Feb 5 21:48:14 2018 +0200 sctp: fix dst refcnt leak in sctp_v4_get_dst ... Fixes: 410f03831 ("sctp: add routing output fallback") Fixes: 0ca50d12f ("sctp: fix src address selection if using secondary addresses") commit 957d761cf91cdbb175ad7d8f5472336a4d54dbf2 Author: Alexey Kodanev [off-list ref] Date: Mon Feb 5 15:10:35 2018 +0300 sctp: fix dst refcnt leak in sctp_v6_get_dst() ... Fixes: dbc2b5e9a09e ("sctp: fix src address selection if using secondary addresses for ipv6") I guess we missed something if it's still reproducible. I can check it later this week, unless someone else beat me to it.Hi Tommi, Hmmm, I can't claim that it's exactly the same bug. Perhaps it's another one then. But I am still seeing these: [ 58.799130] unregister_netdevice: waiting for lo to become free. Usage count = 4 [ 60.847138] unregister_netdevice: waiting for lo to become free. Usage count = 4 [ 62.895093] unregister_netdevice: waiting for lo to become free. Usage count = 4 [ 64.943103] unregister_netdevice: waiting for lo to become free. Usage count = 4 on upstream tree pulled ~12 hours ago.
Can you write a systemtap script to probe dev_hold, and dev_put, printing out a backtrace if the device name matches "lo". That should tell us definitively if the problem is in the same location or not Neil
Kernel does not detect this as any kind of BUG/WARNING, so syzkaller/syzbot do not catch it as bug and do not try to reproduce, localize and report. -- To unsubscribe from this list: send the line "unsubscribe linux-sctp" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html