Re: Correct usage of dev_base_lock in 2020
From: Vladimir Oltean <olteanv@gmail.com>
Date: 2020-11-30 19:04:48
On Mon, Nov 30, 2020 at 08:00:40PM +0100, Eric Dumazet wrote:
On 11/30/20 7:48 PM, Vladimir Oltean wrote:quoted
On Mon, Nov 30, 2020 at 10:14:05AM -0800, Jakub Kicinski wrote:quoted
On Mon, 30 Nov 2020 11:41:10 +0100 Eric Dumazet wrote:quoted
quoted
So dev_base_lock dates back to the Big Kernel Lock breakup back in Linux 2.4 (ie before my time). The time has come to get rid of it. The use is sysfs is because could be changed to RCU. There have been issues in the past with sysfs causing lock inversions with the rtnl mutex, that is why you will see some trylock code there. My guess is that dev_base_lock readers exist only because no one bothered to do the RCU conversion.I think we did, a long time ago. We took care of all ' fast paths' already. Not sure what is needed, current situation does not bother me at all ;)Perhaps Vladimir has a plan to post separately about it (in that case sorry for jumping ahead) but the initial problem was procfs which is (hopefully mostly irrelevant by now, and) taking the RCU lock only therefore forcing drivers to have re-entrant, non-sleeping .ndo_get_stats64 implementations.Right, the end reason why I'm even looking at this is because I want to convert all callers of dev_get_stats to use sleepable context and not atomic. This makes it easier to gather statistics from devices that have a firmware, or off-chip devices behind a slow bus like SPI. Like Jakub pointed out, some places call dev_get_stats while iterating through the list of network interfaces - one would be procfs, but not only. These callers are pure readers, so they use RCU protection. But that gives us atomic context when calling dev_get_stats. The naive solution is to convert all those callers to hold the RTNL mutex, which is the writer-side protection for the network interface lists, and which is sleepable. In fact I do have a series of 8 patches where I get that done. But there are some weirder cases, such as the bonding driver, where I need to do this: -----------------------------[cut here]----------------------------- From 369a0e18a2446cda8ff52d72c02aa144ae6687ec Mon Sep 17 00:00:00 2001 From: Vladimir Oltean <vladimir.oltean@nxp.com> Date: Mon, 30 Nov 2020 02:39:46 +0200 Subject: [PATCH] net: bonding: retrieve device statistics under RTNL, not RCU In the effort of making .ndo_get_stats64 be able to sleep, we need to ensure the callers of dev_get_stats do not use atomic context. The bonding driver uses an RCU read-side critical section to ensure the integrity of the list of network interfaces, because the driver iterates through all net devices in the netns to find the ones which are its configured slaves. We still need some protection against an interface registering or deregistering, and the writer-side lock, the RTNL mutex, is fine for that, because it offers sleepable context. We are taking the RTNL this way (checking for rtnl_is_locked first) because the RTNL is not guaranteed to be held by all callers of ndo_get_stats64, in fact there will be work in the future that will avoid as much RTNL-holding as possible. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> --- drivers/net/bonding/bond_main.c | 18 +++++++----------- include/net/bonding.h | 1 - 2 files changed, 7 insertions(+), 12 deletions(-)diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index e0880a3840d7..1d44534e95d2 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c@@ -3738,21 +3738,17 @@ static void bond_get_stats(struct net_device *bond_dev, struct rtnl_link_stats64 *stats) { struct bonding *bond = netdev_priv(bond_dev); + bool rtnl_locked = rtnl_is_locked(); struct rtnl_link_stats64 temp; struct list_head *iter; struct slave *slave; - int nest_level = 0; + if (!rtnl_locked) + rtnl_lock();Gosh, do not do that. Convert the bonding ->stats_lock to a mutex instead. Adding more reliance to RTNL is not helping cases where access to stats should not be blocked by other users of RTNL (which can be abused)
I can't, Eric. The bond_for_each_slave() macro needs protection against net devices being registered and unregistered.