Thread (260 messages) 260 messages, 21 authors, 2017-11-14

Re: [RFC] Generic flow director/filtering/classification API

From: John Fastabend <john.fastabend@gmail.com>
Date: 2016-08-10 16:46:53

On 16-08-10 06:37 AM, Adrien Mazarguil wrote:
On Tue, Aug 09, 2016 at 02:47:44PM -0700, John Fastabend wrote:
quoted
On 16-08-04 06:24 AM, Adrien Mazarguil wrote:
quoted
On Wed, Aug 03, 2016 at 12:11:56PM -0700, John Fastabend wrote:
[...]
quoted
quoted
quoted
The problem is keeping priorities in order and/or possibly breaking
rules apart (e.g. you have an L2 table and an L3 table) becomes very
complex to manage at driver level. I think its easier for the
application which has some context to do this. The application "knows"
if its a router for example will likely be able to pack rules better
than a PMD will.
I don't think most applications know they are L2 or L3 routers. They may not
know more than the pattern provided to the PMD, which may indeed end at a L2
or L3 protocol. If the application simply chooses a table based on this
information, then the PMD could have easily done the same.
But when we start thinking about encap/decap then its natural to start
using this interface to implement various forwarding dataplanes. And one
common way to organize a switch is into a TEP, router, switch
(mac/vlan), ACL tables, etc. In fact we see this topology starting to
show up in the NICs now.

Further each table may be "managed" by a different entity. In which
case the software will want to manage the physical and virtual networks
separately.

It doesn't make sense to me to require a software aggregator object to
marshal the rules into a flat table then for a PMD to split them apart
again.
OK, my point was mostly about handling basic cases easily and making sure
applications do not have to bother with petty HW details when they do not
want to, yet still get maximum performance by having the PMD make the most
appropriate choices automatically.

You've convinced me that in many cases PMDs won't be able to optimize
efficiently and that conscious applications will know better. The API has to
provide the ability to do so. I think it's fine as long as it is not
mandatory.
Great. I also agree making table feature _not_ mandatory for many use
cases will be helpful. I'm just making sure we get all the use cases I
know of covered.
quoted
quoted
I understand the issue is what happens when applications really want to
define e.g. L2/L3/L2 rules in this specific order (or any ordering that
cannot be satisfied by HW due to table constraints).

By exposing tables, in such a case applications should move all rules from
L2 to a L3 table themselves (assuming this is even supported) to guarantee
ordering between rules, or fail to add them. This is basically what the PMD
could have done, possibly in a more efficient manner in my opinion.
I disagree with the more efficient comment :)

If the software layer is working on L2/TEP/ACL/router layers merging
them just to pull them back apart is not going to be more efficient.
Moving flow rules around cannot be efficient by definition, however I think
that attempting to describe table capabilities may be as complicated as
describing HW bit-masking features. Applications may get it wrong as a
result while a PMD would not make any mistake.

Your use case is valid though, if the application already groups rules, then
sharing this information with the PMD would make sense from a performance
standpoint.
quoted
quoted
Let's assume two opposite scenarios for this discussion:

- App #1 is a command-line interface directly mapped to flow rules, which
  basically gets slow random input from users depending on how they want to
  configure their traffic. All rules differ considerably (L2, L3, L4, some
  with incomplete bit-masks, etc). All in all, few but complex rules with
  specific priorities.
Agree with this and in this case the application should be behind any
network physical/virtual and not giving rules like encap/decap/etc. This
application either sits on the physical function and "owns" the hardware
resource or sits behind a virtual switch.

quoted
- App #2 is something like OVS, creating and deleting a large number of very
  specific (without incomplete bit-masks) and mostly identical
  single-priority rules automatically and very frequently.
Maybe for OVS but not all virtual switches are built with flat tables
at the bottom like this. Nor is it optimal it necessarily optimal.

Another application (the one I'm concerned about :) would be build as
a pipeline, something like

	ACL -> TEP -> ACL -> VEB -> ACL

If I have hardware that supports a TEP hardware block an ACL hardware
block and a VEB  block for example I don't want to merge my control
plane into a single table. The merging in this case is just pure
overhead/complexity for no gain.
It could be done by dedicating priority ranges for each item in the
pipeline but then it would be clunky. OK then, let's discuss the best
approach to implement this.

[...]
quoted
quoted
quoted
Its not about mask vs no mask. The devices with multiple tables that I
have don't have this mask limitations. Its about how to optimally pack
the rules and who implements that logic. I think its best done in the
application where I have the context.

Is there a way to omit the table field if the PMD is expected to do
a best effort and add the table field if the user wants explicit
control over table mgmt. This would support both models. I at least
would like to have explicit control over rule population in my pipeline
for use cases where I'm building a pipeline on top of the hardware.
Yes that's a possibility. Perhaps the table ID to use could be specified as
a meta pattern item? We'd still need methods to report how many tables exist
and perhaps some way to report their limitations, these could be later
through a separate set of functions.
Sure I think a meta pattern item would be fine or put it in the API call
directly, something like

  rte_flow_create(port_id, pattern, actions);
  rte_flow_create_table(port_id, table_id, pattern, actions);
I suggest using a common method for both cases, either seems fine to me, as
long as a default table value can be provided (zero) when applications do
not care.
Works for me just use zero as the default when the application has no
preference and expects PMD to do the table mapping.
Now about tables management, I think there is no need to not expose table
capabilities (in case they have different capabilities) but instead provide
guidelines as part of the specification to encourage applications writers to
group similar rules in tables. A previously discussed, flow rules priorities
would be specific to the table they are affected to.
This seems sufficient to me.
Like flow rules, table priorities could be handled through their index with
index 0 having the highest priority. Like flow rule priorities, table
indices wouldn't have to be contiguous.

If this works for you, how about renaming "tables" to "groups"?
Works for me. And actually I like renaming them "groups" as this seems
more neutral to how the hardware actually implements a group. For
example I've worked on hardware with multiple Tunnel Endpoint engines
but we exposed it as a single "group" to simplify the user interface.

.John
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help