Re: WARNING: bad unlock balance in sch_direct_xmit
From: Cong Wang <hidden>
Date: 2020-01-11 21:53:48
On Thu, Jan 9, 2020 at 10:02 PM Taehee Yoo [off-list ref] wrote:
ndo_get_lock_subclass() was used to calculate subclass which was used by
netif_addr_lock_nested().
-static inline void netif_addr_lock_nested(struct net_device *dev)
-{
- int subclass = SINGLE_DEPTH_NESTING;
-
- if (dev->netdev_ops->ndo_get_lock_subclass)
- subclass = dev->netdev_ops->ndo_get_lock_subclass(dev);
-
- spin_lock_nested(&dev->addr_list_lock, subclass);
-}
The most important thing about nested lock is to get the correct subclass.
nest_level was used as subclass and this was calculated by
->ndo_get_lock_subclass().
But, ->ndo_get_lock_subclass() didn't calculate correct subclass.
After "master" and "nomaster" operations, nest_level should be updated
recursively, but it didn't. So incorrect subclass was used.
team3 <-- subclass 0
"ip link set team3 master team2"
team2 <-- subclass 0
team3 <-- subclass 1
"ip link set team2 master team1"
team1 <-- subclass 0
team3 <-- subclass 1
team3 <-- subclass 1
"ip link set team1 master team0"
team0 <-- subclass 0
team1 <-- subclass 1
team3 <-- subclass 1
team3 <-- subclass 1
After "master" and "nomaster" operation, subclass values of all lower or
upper interfaces would be changed. But ->ndo_get_lock_subclass()
didn't update subclass recursively, lockdep warning appeared.
In order to fix this, I had two ways.
1. use dynamic keys instead of static keys.
2. fix ndo_get_lock_subclass().
The reason why I adopted using dynamic keys instead of fixing
->ndo_get_lock_subclass() is that the ->ndo_get_lock_subclass() isn't
a common helper function.
So, driver writers should implement ->ndo_get_lock_subclass().
If we use dynamic keys, ->ndo_get_lock_subclass() code could be removed.The details you provide here are really helpful for me to understand the reasons behind your changes. Let me think about this and see how I could address both problems. This appears to be harder than I originally thought.
What I fixed problems with dynamic lockdep keys could be fixed by nested lock too. I think if the subclass value synchronization routine works well, there will be no problem.
Great! We are on the same page. Thanks for all the information and the reproducer too!