Re: [PATCH] net: fix possible deadlocks in rtnl_trylock/unlock
From: Eric Dumazet <hidden>
Date: 2012-12-01 18:36:17
Also in:
batman
On Sat, 2012-12-01 at 18:29 +0100, Simon Wunderlich wrote:
quoted hunk ↗ jump to hunk
If rtnl_trylock() is used to prevent circular dependencies, rtnl_unlock() may destroy this attempt because it first unlocks rtnl_mutex but may lock it again later. The callgraph roughly looks like: rtnl_unlock() netdev_run_todo() __rtnl_unlock() netdev_wait_allrefs() rtnl_lock() ... __rtnl_unlock() There are two users which have possible deadlocks and are fixed in this patch: batman-adv and bridge. Their problematic access pattern is the same: notifier_call handler: * holds rtnl lock when called * calls sysfs code at some point (acquiring sysfs locks) sysfs code: * holds sysfs lock when called * calls rtnl_trylock() and rtnl_unlock(), rtnl_unlock() calls rtnl_lock implicitly Fix this by exporting __rtnl_unlock() to just unlock the mutex without implicitly locking again. Reported-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <redacted> --- Of course, an alternative would be to not lock again after unlocking within rtnl_unlock(), or put the todo handling into the locked section. I'm not familiar enough with this code to know what would be best. There are others using rtnl_trylock(), but I'm not sure if they are affected. --- net/batman-adv/sysfs.c | 2 +- net/bridge/br_sysfs_br.c | 2 +- net/bridge/br_sysfs_if.c | 2 +- net/core/rtnetlink.c | 1 + 4 files changed, 4 insertions(+), 3 deletions(-)diff --git a/net/batman-adv/sysfs.c b/net/batman-adv/sysfs.c index 66518c7..41b74aa 100644 --- a/net/batman-adv/sysfs.c +++ b/net/batman-adv/sysfs.c@@ -635,7 +635,7 @@ static ssize_t batadv_store_mesh_iface(struct kobject *kobj, ret = batadv_hardif_enable_interface(hard_iface, buff); unlock: - rtnl_unlock(); + __rtnl_unlock(); out: batadv_hardif_free_ref(hard_iface); return ret;diff --git a/net/bridge/br_sysfs_br.c b/net/bridge/br_sysfs_br.c index c5c0593..c122782 100644 --- a/net/bridge/br_sysfs_br.c +++ b/net/bridge/br_sysfs_br.c@@ -142,7 +142,7 @@ static ssize_t store_stp_state(struct device *d, if (!rtnl_trylock()) return restart_syscall(); br_stp_set_enabled(br, val); - rtnl_unlock(); + __rtnl_unlock(); return len; }
I have no idea of why you believe there is a problem here. Could you explain how net_todo_list could be not empty ? As long as no device is unregistered between rtnl_trylock()/rtnl_unlock(), there is no possible deadlock.