Re: WARNING: bad unlock balance in sch_direct_xmit
From: Taehee Yoo <ap420073@gmail.com>
Date: 2020-01-10 03:06:52
On Fri, 10 Jan 2020 at 08:38, Cong Wang [off-list ref] wrote:
On Wed, Jan 8, 2020 at 3:43 AM Taehee Yoo [off-list ref] wrote:quoted
On Wed, 8 Jan 2020 at 09:34, Cong Wang [off-list ref] wrote:quoted
On Tue, Jan 7, 2020 at 3:31 AM Taehee Yoo [off-list ref] wrote:quoted
After "ip link set team0 master team1", the "team1 -> team0" locking path will be recorded in lockdep key of both team1 and team0. Then, if "ip link set team1 master team0" is executed, "team0 -> team1" locking path also will be recorded in lockdep key. At this moment, lockdep will catch possible deadlock situation and it prints the above warning message. But, both "team0 -> team1" and "team1 -> team0" will not be existing concurrently. so the above message is actually wrong. In order to avoid this message, a recorded locking path should be removed. So, both lockdep_unregister_key() and lockdep_register_key() are needed.So, after you move the key down to each netdevice, they are now treated as different locks. Is this stacked device scenario the reason why you move it to per-netdevice? If so, I wonder why not just use nested locks? Like: netif_addr_nested_lock(upper, 0); netif_addr_nested_lock(lower, 1); netif_addr_nested_unlock(lower); netif_addr_nested_unlock(upper); For this case, they could still share a same key. Thanks for the details!Yes, the reason for using dynamic lockdep key is to avoid lockdep warning in stacked device scenario. But, the addr_list_lock case is a little bit different. There was a bug in netif_addr_lock_nested() that "dev->netdev_ops->ndo_get_lock_subclass" isn't updated after "master" and "nomaster" command. So, the wrong subclass is used, so lockdep warning message was printed.Hmm? I never propose netdev_ops->ndo_get_lock_subclass(), and the subclasses are always 0,1, no matter which is the master device, so it doesn't need a ops.
It's just the reason why the dynamic lockdep key was adopted instead of a nested lock.
quoted
There were some ways to fix this problem, using dynamic key is just one of them. I think using the correct subclass in netif_addr_lock_nested() is also a correct way to fix that problem. Another minor reason was that the subclass is limited by 8. but dynamic key has no limitation.Yeah, but in practice I believe 8 is sufficient for stacked devices.
I agree with this.
quoted
Unfortunately, dynamic key has a problem too. lockdep limits the maximum number of lockdep keys.Right, and also the problem reported by syzbot, that is not safe during unregister and register.
qdisc_xmit_lock_key has this problem. But, I'm not sure about addr_list_lock_key. If addr_list_lock is used outside of RTNL, it has this problem. If it isn't used outside of RTNL, it doesn't have this problem.
Anyway, do you think we should revert back to the static keys and use subclass to address the lockdep issue instead? Thanks!
I agree with this to reduce the number of dynamic lockdep keys. Thanks a lot!