Re: [patch net-next RFC 0/4] introduce infrastructure for support of switch chip datapath
From: Jiri Pirko <jiri@resnulli.us>
Date: 2014-03-22 09:40:11
Thu, Mar 20, 2014 at 06:21:10PM CET, f.fainelli@gmail.com wrote:
2014-03-20 5:40 GMT-07:00 Jiri Pirko [off-list ref]:quoted
Thu, Mar 20, 2014 at 12:49:07PM CET, jhs@mojatatu.com wrote:quoted
Hi Jiri, On 03/19/14 11:33, Jiri Pirko wrote:quoted
This is just an early draft, RFC. I wanted to post this early to get the feedback as soon as possible. The basic idea is to introduce a generic infractructure to support various switch chips in kernel. Also the idea is to benefit of currently existing Open vSwitch userspace infrastructure.I think the abstraction should be a netdev and to be specific the bridge - not openvswitch. Our current tools like ifconfig, iproute2, bridge etc should continue to work.That is exactly the case. Nothing is specific to OVS. OVS is just a one method to access the switchdev api. Abstraction is netdev. One netdev per each switch port and one netdev as a master on the top of that representing the switch itself.quoted
In my experience, it is sufficient to model a switch after the linux bridge at the basic level if the starting point is L2 (which is the lowest common denominator). And then you add capabilities that different chips expose. Not every chip can do vxlan, flows etc. And we already know how to abstract those out. My experience on top of broadcom chips is the approach i described works rather well. Additionally, note: We do have L2 devices that offload in the kernel (refer to DSA, posting earlier from the openwrt guys, and the intel devices which do VDMQ etc). I am now counting we have 5 different approaches if we add yours.I think that the problem is that each solution serves different purpose. For example DSA is for switches connected as a PHY to a MAC. That is completely different case to what my switchdev API is trying to handle.I agree with Jamal here, we should try to find a solution that fits most users here, it seems to me like there are 3 switches categories: - entreprise built-in switches in NICs that support VF/PF - embedded/entreprise switches that support tagging (Marvell eDSA/DSA, Broadcom tags) - embedded switches that only support 802.1q VLANs
One case which you maybe forgot:
switch chip
------------------------
| | | | | | | CPU
p1 p2 ...pn px py MNGMNT -----------
| | | pcie
| | | ---------------
| | | | NIC0 NIC1
| | ---pcie----- | |
| ------someMII------- |
---------someMII-----------
NIC0 and NIC1 are ordinary NICs like 8139too for example with no
notion they are connected to a switch. They as completely
independent on the mngmnt iface.
The first category is more flow-oriented than control-oriented, whereas the last two are more "event and control" oriented where you usually have a system where the switch will be configured not to flood the CPU port if possible, but when it does, this is to perform specific configuration (address learning, port protection, snooping, authorization...). DSA is not designed specifically for switches which are connected to a MAC and appear as a regular PHY, this is how it first started, but nothing prevents you from using DSA with a switch that is e.g: memory mapped into your CPU register space, MDIO is just the transport for the control part.
I see that DSA now is *very* MII-oriented. I'm not sure how hard it would be to rewrite it to be more negeric and if it would make sense at all.
For instance, if my switches support a N-bytes tag that will give me a reason code for receiving this frame, and a bitmap representing the originating port, how would you imagine this fitting into the openvswitch/switchdev model, aside from the netdev per-port? Do you think we could easily migrate existing DSA users to openvswitch/switchdev by handling the custom switch tag?
I do not think so either.
-- Florian