Re: [PATCH net-next v3 2/7] net: lan966x: Split lan966x_fdb_event_work

From: Vladimir Oltean <vladimir.oltean@nxp.com>
Date: 2022-07-02 14:08:42
Also in: lkml

On Fri, Jul 01, 2022 at 10:52:22PM +0200, Horatiu Vultur wrote:

Split the function lan966x_fdb_event_work. One case for when the
orig_dev is a bridge and one case when orig_dev is lan966x port.
This is preparation for lag support. There is no functional change.

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
---

-static void lan966x_fdb_event_work(struct work_struct *work)
+void lan966x_fdb_flush_workqueue(struct lan966x *lan966x)
+{
+	flush_workqueue(lan966x->fdb_work);
+}
+

quoted hunk ↗ jump to hunk

diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_switchdev.c b/drivers/net/ethernet/microchip/lan966x/lan966x_switchdev.c
index df2bee678559..d9fc6a9a3da1 100644
--- a/drivers/net/ethernet/microchip/lan966x/lan966x_switchdev.c
+++ b/drivers/net/ethernet/microchip/lan966x/lan966x_switchdev.c

@@ -320,9 +320,10 @@ static int lan966x_port_prechangeupper(struct net_device *dev,
 {
 	struct lan966x_port *port = netdev_priv(dev);
 
-	if (netif_is_bridge_master(info->upper_dev) && !info->linking)
-		switchdev_bridge_port_unoffload(port->dev, port,
-						NULL, NULL);
+	if (netif_is_bridge_master(info->upper_dev) && !info->linking) {
+		switchdev_bridge_port_unoffload(port->dev, port, NULL, NULL);
+		lan966x_fdb_flush_workqueue(port->lan966x);
+	}

Very curious as to why you decided to stuff this change in here.
There was no functional change in v2, now there is. And it's a change
you might need to come back to later (probably sooner than you'd like),
since the flushing of the workqueue is susceptible to causing deadlocks
if done improperly - let's see how you blame a commit that was only
supposed to move code, in that case ;)

The deadlock that I'm talking about comes from the fact that
lan966x_port_prechangeupper() runs with rtnl_lock() held. So the code of
the flushed workqueue item must not hold rtnl_lock(), or any other lock
that is blocked by the rtnl_lock(). Otherwise, the flushing will wait
for a workqueue item to complete, that in turn waits to acquire the
rtnl_lock, which is held by the thread waiting the workqueue to complete.

Analyzing your code, lan966x_mac_notifiers() takes rtnl_lock().
That is taken from threaded interrupt context - lan966x_mac_irq_process(),
but is a sub-lock of spin_lock(&lan966x->mac_lock).

There are 2 problems with that already: rtnl_lock() is a mutex => can
sleep, but &lan966x->mac_lock is a spin lock => is atomic. You can't
take rtnl_lock() from atomic context. Lockdep and/or CONFIG_DEBUG_ATOMIC_SLEEP
will tell you so much.

The second problem is the lock ordering inversion that this causes.
There exists a threaded IRQ which takes the locks in the order mac_lock
-> rtnl_lock, and there exists this new fdb_flush_workqueue which takes
the locks in the order rtnl_lock -> mac_lock. If they run at the same
time, kaboom. Again, lockdep will tell you as much.

I'm sorry, but you need to solve the existing locking problems with the
code first.

 
 	return NOTIFY_DONE;
 }
-- 
2.33.0

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help