Re: VRFs and the scalability of namespaces

From: David Ahern <hidden>
Date: 2014-09-29 13:06:59

Hi Hannes:

On 9/27/14, 7:29 AM, Hannes Frederic Sowa wrote:

Did you already did an investigation how maybe the rule and table
features could be exploited to suite your needs? Some time back I

I did look into the existing multiple table option but not to the extent 
of creating a POC. It has been on my to-do list for 4+ months now I just 
have not had time to get to it. Based on a number of Google searches to 
review the history of VRFs and the kernel, I did see the use of multiple 
routing tables has been suggested as well and its problems have been 
delineated. e.g.,

     http://www.spinics.net/lists/linux-net/msg17502.html

suggested something like "ip route table foo exec ....", keep an default
routing lookup indicator in task_struct which gets implicitly propagated
to rtnetlink routing table requests/modification for the requested
table. Tables already can be specified via rtnetlink, so no change would
be needed here.

For sockets something like SO_BINDTOTABLE might work, maybe even we can
by default use the task_struct information to also bind the sockets to
the per-process table. We certainly need to preserve the routing
information on the socket as we need those in icmp error handling (e.g.
where to apply ipv4/ipv6 redirects too). Directing incoming packets to
specific table also works via ip-rule-iif match.

Advantage with the ip route table foo exec... method would be, that
conversion of some unmodified routing management daemons might be
easier, others can either use rtnetlink extended attributes which are
already available, and we only need to have per-process context routing
table control, which seems not too hard to implement in ip-rule
subsystem, but I haven't checked.

The problem I see with rules is that some of those tables already work
hand in hand, they already have a implicit semantics, e.g. local, main,
default and unspec (this is even worse for IPv6, where addrconf already
uses hardcoded tables). Working around this might be very tricky and
even more problematic to do from user space.

I think I am not yet sure what features you want from VRFs, some things
seem to match the rule/table features but others I think are pretty hard
to implement.

The features of note:
- resource efficiency -- not having to create a proces/thread/socket per 
VRF to have a "presence" in all VRFs. e.g., a VRF any context that 
allows 1 socket to work across VRFs (L3 raw socket, TCP listen socket, 
unconnected UDP socket). Daemons run a 'vrf any' context; connected 
clients run a specific vrf context. For non-connected sockets VRF 
context can be passed via cmsg.

- same IP address on different interfaces in different vrfs. i.e., VRF 
specific routing and neighbor tables

- cross VRF routing. ability to receive message on 1 vrf and send it on 
another. Can be handled by the process itself (e.g., L3 vpns).

Thanks,
David

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help