Thread (52 messages) 52 messages, 10 authors, 2012-03-13

Re: [RFC PATCH v0 1/2] net: bridge: propagate FDB table into hardware

From: jamal <hidden>
Date: 2012-02-10 15:18:31
Also in: kvm

Hi John,

I went backwards to summarize at the top after going through your email.

TL;DR version 0.1: 
you provide a good use case where it makes sense to do things in the
kernel. IMO, you could make the same arguement if your embedded switch
could do ACLs, IPv4 forwarding etc. And the kernel bloats.
I am always bigoted to move all policy control to user space instead of
bloating in the kernel.

 
On Thu, 2012-02-09 at 20:14 -0800, John Fastabend wrote:
quoted
Hi Jamal,

The user space app in this case would listen for FDB updates to the SW
bridge and then mirror them at the embedded NIC. In this case it seems
easier to just add a notifier chain and let the kernel keep these in
sync. Otherwise we need a daemon in user space to replicate these.
A user space daemon if you need to ensure synchronization. Thats what i
meant when i said there was a "disadvantage" over the simple case when
the goal is always to synchronize.
quoted
On the other hand if you could make the same RTM_NEWNEIGH, RTM_DELNEIGH,
and RTM_GETNEIGH work for the bridge, embedded bridge, and macvlan you
would have one common interface to drive these. But the bridge already
has this protocol/msgtype so that would require either some demux or
new protocol/msgtype pairs to be created. 
The bridge is very netlink friendly these days. Given the rest of the
network stack (*NEIGH* you mention above) talks netlink to user space
it should be workable. 
quoted
Let me think on it. I'm tempted by the simplicity of adding notifier
hooks though.
If something is missing bridge-side it may need to be added (as Per
Stephen's comment) - i just took it one further indicating those
notifiers need to also netlink-speak

Actually because the bridge is adding/removing fdb entries dynamically
maybe its best this gets done in kernel. Here's the example case,
[..]
With the flow by letters above hope this is not too difficult to follow.
(A) veth0 a virtual device transmits packet destined for ethx.y
(B) SW bridge receives frames and updates FDB flooding to C
(C) eth0 the PF in this case sends the frame to the HW backed by the
    embedded bridge
Following so far.
Can you have more than one PF per embedded switch? Or is the intent here
purely to do VMs/VF separation?
(D) The HW embedded switch has a static entry for ethx.y and forwards
    the frame to the VF or if its a broadcast frame also floods it to
    the wire and ethx.y
nod.
(E) ethx.y receives the frame and generates a response to the dest mac of
    veth0
nod.
Since you said in #D the entries in the switch are static, I am assuming
at this point neither ethx.y nor veth0 exist in the embedded FDB.
Now here is the potential issue,

(G) The frame transmitted from ethx.y with the destination address of
    veth0 but the embedded switch is not a learning switch. If the FDB
    update is done in user space its possible (likely?) that the FDB
    entry for veth0 has not been added to the embedded switch yet. 
Ok, got it - so the catch here is the switch is not capable of learning.
I think this depends on where learning is done. Your intent is to
use the S/W bridge as something that does the learning for you i.e in
the kernel. This makes the s/w bridge part of MUST-have-for-this-to-run.
And that maybe the case for your use case.

What if I dont wanna run the S/W bridge at all?
Ive been making a point that with a simple knob(Stephen doesn like to
add such a knob), the SW bridge could defer learning to user space. 
[This way you can add a lot of richness e.g on ACLs such as restricting
what MAC addresses etc are allowed to talk to which ones etc.].
But if bypass the s/w bridge all together and learn in user space
or have a static config in which i populate the embedded switch, i dont
see the issue.
Now
    we either have to flood the frame which is not horrible but not
    ideal or worse if the embedded switch does not support flooding send
    it to the wire and veth0 never receives it. 
If it is a switch it has to flood, no? Otherwise it sounds broken.
If the SW bridge pushes
    the FDB update down into the embedded switch the address is for
    sure in the embedded switches forwarding tables and the switching
    works as expected.
Yes, there is a small gap between the s/w bridge learning and the
synchronization happening to the embedded nic switch. That gap gets
larger if you defer learning to user space. But like you said earlier,
during that gap packets are flooded - and do you care if the
synchronization doesnt happen immediately?
So to handle this case correctly its probably best IMHO to use a notifier
hook. Having a RTM_GETNEIGH for the embedded switch implemented though
would be nice for dumping the FDB of the embedded switch and SET/DEL
could be used to configure the FDB when its not being driven by the SW
switch. Of course we should try to be minimalists here.
Do you need to have a different *NEIGH* than what we already have
really?

The problem with putting policies in the kernel is you are gonna keep
adding more. Bloat user space instead. 

cheers,
jamal

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help