Re: [PATCH v2 net-next 2/4] net: dsa: Link aggregation support
From: Vladimir Oltean <olteanv@gmail.com>
Date: 2020-12-01 01:38:16
Hi Tobias, On Mon, Nov 30, 2020 at 03:06:08PM +0100, Tobias Waldekranz wrote:
quoted hunk ↗ jump to hunk
Monitor the following events and notify the driver when: - A DSA port joins/leaves a LAG. - A LAG, made up of DSA ports, joins/leaves a bridge. - A DSA port in a LAG is enabled/disabled (enabled meaning "distributing" in 802.3ad LACP terms). Each LAG interface to which a DSA port is attached is represented by a `struct dsa_lag` which is globally reachable from the switch tree and from each associated port. When a LAG joins a bridge, the DSA subsystem will treat that as each individual port joining the bridge. The driver may look at the port's LAG pointer to see if it is associated with any LAG, if that is required. This is analogue to how switchdev events are replicated out to all lower devices when reaching e.g. a LAG. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> --- include/net/dsa.h | 97 +++++++++++++++++++++++++++++++ net/dsa/dsa2.c | 51 ++++++++++++++++ net/dsa/dsa_priv.h | 31 ++++++++++ net/dsa/port.c | 141 +++++++++++++++++++++++++++++++++++++++++++++ net/dsa/slave.c | 83 ++++++++++++++++++++++++-- net/dsa/switch.c | 49 ++++++++++++++++ 6 files changed, 446 insertions(+), 6 deletions(-) +static inline struct dsa_lag *dsa_lag_by_id(struct dsa_switch_tree *dst, int id) +{ + if (!test_bit(id, dst->lags.busy)) + return NULL; + + return &dst->lags.pool[id]; +} + +static inline struct net_device *dsa_lag_dev_by_id(struct dsa_switch_tree *dst, + int id) +{ + struct dsa_lag *lag = dsa_lag_by_id(dst, id); + + return lag ? rcu_dereference(lag->dev) : NULL; +} + +static inline struct dsa_lag *dsa_lag_by_dev(struct dsa_switch_tree *dst, + struct net_device *dev) +{ + struct dsa_lag *lag; + int id; + + dsa_lag_foreach(id, dst) { + lag = dsa_lag_by_id(dst, id); + + if (rtnl_dereference(lag->dev) == dev) + return lag; + } + + return NULL; +}diff --git a/net/dsa/port.c b/net/dsa/port.c index 73569c9af3cc..c2332ee5f5c7 100644 --- a/net/dsa/port.c +++ b/net/dsa/port.c@@ -193,6 +193,147 @@ void dsa_port_bridge_leave(struct dsa_port *dp, struct net_device *br) dsa_port_set_state_now(dp, BR_STATE_FORWARDING); } +static struct dsa_lag *dsa_lag_get(struct dsa_switch_tree *dst, + struct net_device *dev) +{ + struct dsa_lag *lag; + int id; + + lag = dsa_lag_by_dev(dst, dev); + if (lag) { + kref_get(&lag->refcount); + return lag; + } + + id = find_first_zero_bit(dst->lags.busy, dst->lags.num); + if (id >= dst->lags.num) { + WARN(1, "No LAGs available"); + return NULL; + } + + set_bit(id, dst->lags.busy); + + lag = &dst->lags.pool[id]; + kref_init(&lag->refcount); + lag->id = id; + INIT_LIST_HEAD(&lag->ports); + INIT_LIST_HEAD(&lag->tx_ports); + + rcu_assign_pointer(lag->dev, dev); + return lag; +} + +static void dsa_lag_release(struct kref *refcount) +{ + struct dsa_lag *lag = container_of(refcount, struct dsa_lag, refcount); + + rcu_assign_pointer(lag->dev, NULL); + synchronize_rcu(); + memset(lag, 0, sizeof(*lag)); +}
What difference does it make if lag->dev is set to NULL right away or after a grace period? Squeezing one last packet from that bonding interface? Pointer updates are atomic operations on all architectures that the kernel supports, and, as long as you use WRITE_ONCE and READ_ONCE memory barriers, there should be no reason for RCU protection that I can see. And unlike typical uses of RCU, you do not free lag->dev, because you do not own lag->dev. Instead, the bonding interface pointed to by lag->dev is going to be freed (in case of a deletion using ip link) after an RCU grace period anyway. And the receive data path is under an RCU read-side critical section anyway. So even if you set lag->dev to NULL using WRITE_ONCE, the existing in-flight readers from the RX data path that had called dsa_lag_dev_by_id() will still hold a reference to a valid bonding interface.