Re: Flows! Offload them.

From: Neil Horman <nhorman@tuxdriver.com>
Date: 2015-02-27 01:23:18

On Fri, Feb 27, 2015 at 06:52:58AM +0900, Simon Horman wrote:

On Thu, Feb 26, 2015 at 03:16:35PM -0500, Neil Horman wrote:

quoted

On Thu, Feb 26, 2015 at 07:23:36AM -0800, John Fastabend wrote:

quoted

On 02/26/2015 05:33 AM, Thomas Graf wrote:

quoted

On 02/26/15 at 10:16am, Jiri Pirko wrote:

quoted

Well, on netdev01, I believe that a consensus was reached that for every
switch offloaded functionality there has to be an implementation in
kernel.

Agreed. This should not prevent the policy being driven from user
space though.

quoted

What John's Flow API originally did was to provide a way to
configure hardware independently of kernel. So the right way is to
configure kernel and, if hw allows it, to offload the configuration to hw.

In this case, seems to me logical to offload from one place, that being
TC. The reason is, as I stated above, the possible conversion from OVS
datapath to TC.

Offloading of TC definitely makes a lot of sense. I think that even in
that case you will already encounter independent configuration of
hardware and kernel. Example: The hardware provides a fixed, generic
function to push up to n bytes onto a packet. This hardware function
could be used to implement TC actions "push_vlan", "push_vxlan",
"push_mpls". You would you would likely agree that TC should make use
of such a function even if the hardware version is different from the
software version. So I don't think we'll have a 1:1 mapping for all
configurations, regardless of whether the how is decided in kernel or
user space.

Just to expand slightly on this. I don't think you can get to a 1:1
mapping here. One reason is hardware typically has a TCAM and limited
size. So you need a _policy_ to determine when to push rules into the
hardware. The kernel doesn't know when to do this and I don't believe
its the kernel's place to start enforcing policy like this. One thing I likely
need to do is get some more "worlds" in rocker so we aren't stuck only
thinking about the infinite size OF_DPA world. The OF_DPA world is only
one world and not a terribly flexible one at that when compared with the
NPU folk. So minimally you need a flag to indicate rules go into hardware
vs software.

That said I think the bigger mismatch between software and hardware is
you program it differently because the data structures are different. Maybe
a u32 example would help. For parsing with u32 you might build a parse
graph with a root and some leaf nodes. In hardware you want to collapse
this down onto the hardware. I argue this is not a kernel task because
there are lots of ways to do this and there are trade-offs made with
respect to space and performance and which table to use when it could be
handled by a set of tables. Another example is a virtual switch possibly
OVS but we have others. The software does some "unmasking" (there term)
before sending the rules into the software dataplane cache. Basically this
means we can ignore priority in the hash lookup. However this is not how you
would optimally use hardware. Maybe I should do another write up with
some more concrete examples.

There are also lots of use cases to _not_ have hardware and software in
sync. A flag allows this.

My only point is I think we need to allow users to optimally use there
hardware either via 'tc' or my previous 'flow' tool. Actually in my
opinion I still think its best to have both interfaces.

I'll go get some coffee now and hopefully that is somewhat clear.


I've been thinking about the policy apect of this, and the more I think
about it, the more I wonder if not allowing some sort of common policy in
the kernel is really the right thing to do here.  I know thats somewhat
blasphemous, but this isn't really administrative poilcy that we're
talking about, at least not 100%.  Its more of a behavioral profile that
we're trying to enforce.  That may be splitting hairs, but I think theres
precidence for the latter.  That is to say, we configure qdiscs to limit
traffic flow to certain rates, and configure policies which drop traffic
that violates it (which includes random discard, which is the antithesis
of deterministic policy).  I'm not sure I see this as any different,
espcially if we limit its scope.  That is to say, why couldn't we allow
the kernel to program a predetermined set of policies that the admin can
set (i.e. offload routing to a hardware cache of X size with an lru
victimization).  If other well defined policies make sense, we can add
them and exposes options via iproute2 or some such to set them.  For the
use case where such pre-packaged policies don't make sense, we have
things like the flow api to offer users who want to be able to control
their hardware in a more fine grained approach.

In general I agree that it makes sense to have have sane offload policy
in the kernel and provide a mechanism to override that. Things that already
work should continue to work: just faster or with fewer CPU cycles consumed.

Yes, exactly that, for the general traditional networking use case, that is
exactly what we want, to opportunistically move traffic faster with less load on
the cpu.  We don't nominally care what traffic is offloaded, as long as the
hardware does a better job than just software alone.  If we get an occasional
miss and have to do stuff in software, so be it.

I am, however, not entirely convinced that it is always possible to
implement such a sane default policy that is worth the code complexity -
I'm thinking in particular of Open vSwitch where management of flows is
already in user-space.

So, this is a case in which I think John F.'s low level flow API is more well
suited.  OVS has implemented a user space dataplane that circumvents alot of the
kernel mechanisms for traffic forwarding.  For that sort of application, the
traditional kernel offload "objects" aren't really appropriate.  Instead, OVS
can use the low level flow API to construct its own custom offload pipeline
using whatever rules and policies that it wants.

Of course, using the low level flow API is incompatible with the in-kernel
object offload idea that I'm proposing, but I see the two as able to co-exist,
much like firewalld co-exists with iptables.  You can use both, but you have to
be aware that using the lower layer interface might break the others higher
level oeprations.  And if that happens, its on you to manage it.

Best
Neil

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help