Re: [RFC] tunnel endpoint hw acceleration enablement

From: Shahaf Shuler <hidden>
Date: 2018-02-01 19:59:21

Hi Declan, sorry for the late response. 

Tuesday, January 23, 2018 5:36 PM, Doherty, Declan:

quoted

If I get it right, the API proposed here is to have a tunnel endpoint which is

a logical port on top of ethdev port. the TEP is able to receive and monitor
some specific tunneled traffic, for example VXLAN, GENEVE and more.

quoted

For example, VXLAN TEP can have multiple flows with different VNIs all

under the same context.

quoted

Now, with the current rte_flow APIs, we can do exactly the same and give

the application the full flexibility to group the tunnel flows into logical TEP.

quoted

On this suggestion application will:
1. Create rte_flow rules for the pattern it want to receive.
2. In case it is interested in counting, a COUNT action will be added to the

flow.

quoted

3. In case header manipulation is required, a DECAP/ENCAP/REWRITE

action will be added to the flow.

quoted

4. Grouping of flows into a logical TEP will be done on the application layer

simply by keeping the relevant rte_flow rules in some dedicated struct. With
it, create/destroy TEP can be translated to create/destroy the flow rules.
Statistics query can be done be querying each flow count and sum. Note that
some devices can support the same counter for multiple flows. Even though
it is not yet exposed in rte_flow this can be an interesting optimization.

As I responsed in John's mail I think this approach fails in devices which
support switching offload also. As the flows never hit the host application
configuring the TEP and flows there is no easy way to sum those statistics,

Devices which supports switching offloads must use NIC support to count the flows. It can be either by associating count action with a flow or by using TEP in your proposal.
The TEP counting could be introduced in another way - instead of having 1:1 relation between flow counter and rte_flow, to introduce a counter element which can be attached to multiple flows. 
So this counter element along with the rte_flows it is associate with are basically the TEP:
1. it holds the sum of statistics from all the TEP flows it is associate with.
2. it holds the receive pattern 

My point is, I don't think it is correct to bound between the TEP and the switching offloads actions (encap/decap/rewrite on this context). 
The TEP can be presented as auxiliary library/API to help with the flows grouping, however application still need to have the ability to make the switch offloads control as it wish.

also flows are transitory in terms of runtime so it would not be possible to
keep accurate statistics over a period of time.

Am not sure I understand what you mean here. 
In order to receive traffic you need flows. Even the default RSS configuration of the PMD can be described by rte_flows. 
So as long as one receive traffic it has one/more flows configured on the device.

quoted

As for the capabilities - what specifically you had in mind? The
current

usage you show with tep is with rte_flow rules. There are no
capabilities currently for rte_flow supported actions/pattern. To
check such capabilities application uses rte_flow_validate.

I envisaged that the application should be able to see if an ethdev
can support TEP in the rx/tx offloads, and then the
rte_tep_capabilities would allow applications to query what tunnel
endpoint protocols are supported etc. I would like a simple
mechanism to allow users to see if a particular tunnel endpoint type
is supported without having to build actual flows to validate.

I can see the value of that, but in the end wouldn't the API call
rte_flow_validate anyways? Maybe we don't add the layer now or maybe
it doesn't really belong in DPDK? I'm in favor of deferring the
capabilities API until we know it's really needed.  I hate to see
special capabilities APIs start sneaking in after we decided to go
the rte_flow_validate route and users are starting to get used to it.

I don't see how it is different from any other rte_flow creation.
We don't hold caps for device ability to filter packets according to VXLAN or

GENEVE items. Why we should start now?

I don't know, possibly if it makes adoption of the features easier for the end
user.

quoted

We have already the rte_flow_veirfy. I think part of the reasons for it was

that the number of different capabilities possible with rte_flow is huge. I
think this also the case with the TEP capabilities (even though It is still not
clear to me what exactly they will include).

It may be that only need advertise that we are capable of encap/decap
services, but it would be good to have input from downstream users what
they would like to see.

quoted

Regarding the creation/destroy of tep. Why not simply use rte_flow
API

and avoid this extra control?

quoted

For example - with 17.11 APIs, application can put the port in
isolate mode,

and insert a flow_rule to catch only IPv4 VXLAN traffic and direct
to some queue/do RSS. Such operation, per my understanding, will
create a tunnel endpoint. What are the down sides of doing it with
the current

APIs?

quoted

That doesn't enable encapsulation and decapsulation of the outer
tunnel endpoint in the hw as far as I know. Apart from the inability
to monitor the endpoint statistics I mentioned above. It would also
require that you redefine the endpoints parameters ever time to you
wish to add a new flow to it. I think the having the rte_tep object
semantics should also simplify the ability to enable a full vswitch
offload of TEP where the hw is handling both encap/decap and
switching to

a particular port.

If we have the ingress/decap and egress/encap actions and 1 rte_flow
rule per TEP and use the COUNT action, I think we get all but the
last bit. For that, perhaps the application could keep  ingress and
egress rte_flow template for each tunnel type (VxLAN, GRE, ..). Then
copying the template and filling in the outer packet info and tunnel
Id is all that would be required. We could also define these in rte_flow.h?

quoted


To direct traffic flows to hw terminated tunnel endpoint the
rte_flow API is enhanced to add a new flow item type. This
contains a pointer to the TEP context as well as the overlay flow
id to which the traffic flow is

associated.

quoted

struct rte_flow_item_tep {
                 struct rte_tep *tep;
                 uint32_t flow_id; }

Can you provide more detailed definition about the flow id ? to
which field

from the packet headers it refers to?

quoted

On your below examples it looks like it is to match the VXLAN vni
in case of

VXLAN, what about the other protocols? And also, why not using the
already exists VXLAN item?

I have only been looking initially at couple of the tunnel endpoint
procotols, namely Geneve, NvGRE, and VxLAN, but the idea here is to
allow the user to define the VNI in the case of Geneve and VxLAN and
the VSID in the case of NvGRE on a per flow basis, as per my
understanding these are used to identify the source/destination
hosts on the overlay network independently from the endpoint there
are

transported across.

quoted

The VxLAN item is used in the creation of the TEP object, using the
TEP object just removes the need for the user to constantly redefine
all the tunnel parameters and also I think dependent on the hw
implementation it may simplify the drivers work if it know the exact
endpoint the actions is for instead of having to look it up on each
flow

addition.

quoted

Generally I like the idea of separating the encap/decap context
from the

action. However looks like the rte_flow_item has double meaning on
this RFC, once for the classification and once for the action.

quoted

  From the top of my head I would think of an API which separate
those, and

re-use the existing flow items. Something like:

quoted

   struct rte_flow_item pattern[] = {
                  { set of already exists pattern  },
                  { ... },
                  { .type = RTE_FLOW_ITEM_TYPE_END } };

encap_ctx = create_enacap_context(pattern)

rte_flow_action actions[] = {
	{ .type RTE_FLOW_ITEM_ENCAP, .conf = encap_ctx} }

I not sure I fully understand what you're asking here, but in
general for encap you only would define the inner part of the packet
in the match pattern criteria and the actual outer tunnel headers
would be

defined in the action.

quoted

I guess there is some replication in the decap side as proposed, as
the TEP object is used in both the pattern and the action, possibly
you could get away with having no TEP object defined in the action
data, but I prefer keeping the API symmetrical for encap/decap
actions at the shake of some extra verbosity.

quoted

...

quoted

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help