Thread (43 messages) 43 messages, 6 authors, 2020-11-19

Re: [RFC PATCH 0/4] net: dsa: link aggregation support

From: Tobias Waldekranz <tobias@waldekranz.com>
Date: 2020-10-27 19:38:06

On Tue, Oct 27, 2020 at 21:00, Vladimir Oltean [off-list ref] wrote:
On Tue, Oct 27, 2020 at 07:25:16PM +0100, Tobias Waldekranz wrote:
quoted
quoted
1) trunk user ports, with team/bonding controlling it
2) trunk DSA ports, i.e. the ports between switches in a D in DSA setup
3) trunk CPU ports.
[...]
quoted
I think that (2) and (3) are essentially the same problem, i.e. creating
LAGs out of DSA links, be they switch-to-switch or switch-to-cpu
connections. I think you are correct that the CPU port can not be a
LAG/trunk, but I believe that limitation only applies to TO_CPU packets.
Which would still be ok? They are called "slow protocol PDUs" for a reason.
Oh yes, completely agree. That was the point I was trying to make :)
quoted
In order for this to work on transmit, we need to add forward offloading
to the bridge so that we can, for example, send one FORWARD from the CPU
to send an ARP broadcast to swp1..4 instead of four FROM_CPUs.
That surely sounds like an interesting (and tough to implement)
optimization to increase the throughput, but why would it be _needed_
for things to work? What's wrong with 4 FROM_CPU packets?
We have internal patches that do this, and I can confirm that it is
tough :) I really would like to figure out a way to solve this, that
would also be acceptable upstream. I have some ideas, it is on my TODO.

In a single-chip system I agree that it is not needed, the CPU can do
the load-balancing in software. But in order to have the hardware do
load-balancing on a switch-to-switch LAG, you need to send a FORWARD.

FROM_CPUs would just follow whatever is in the device mapping table. You
essentially have the inverse of the TO_CPU problem, but on Tx FROM_CPU
would make up 100% of traffic.

Other than that there are some things that, while strictly speaking
possible to do without FORWARDs, become much easier to deal with:

- Multicast routing. This is one case where performance _really_ suffers
  from having to skb_clone() to each recipient.

- Bridging between virtual interfaces and DSA ports. Typical example is
  an L2 VPN tunnel or one end of a veth pair. On FROM_CPUs, the switch
  can not perform SA learning, which means that once you bridge traffic
  from the VPN out to a DSA port, the return traffic will be classified
  as unknown unicast by the switch and be flooded everywhere.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help