Thread (260 messages) 260 messages, 21 authors, 2017-11-14

Re: [RFC] Generic flow director/filtering/classification API

From: Lu, Wenzhuo <hidden>
Date: 2016-07-20 02:16:56

Hi Adrien,

-----Original Message-----
From: Adrien Mazarguil [mailto:adrien.mazarguil@6wind.com]
Sent: Tuesday, July 19, 2016 9:12 PM
To: Lu, Wenzhuo
Cc: dev@dpdk.org; Thomas Monjalon; Zhang, Helin; Wu, Jingjing; Rasesh Mody;
Ajit Khaparde; Rahul Lakkireddy; Jan Medala; John Daley; Chen, Jing D; Ananyev,
Konstantin; Matej Vido; Alejandro Lucero; Sony Chacko; Jerin Jacob; De Lara
Guarch, Pablo; Olga Shern
Subject: Re: [RFC] Generic flow director/filtering/classification API

On Tue, Jul 19, 2016 at 08:11:48AM +0000, Lu, Wenzhuo wrote:
quoted
Hi Adrien,
Thanks for your clarification.  Most of my questions are clear, but still
something may need to be discussed, comment below.

Hi Wenzhuo,

Please see below.

[...]
quoted
quoted
quoted
quoted
Requirements for a new API:

- Flexible and extensible without causing API/ABI problems for existing
  applications.
- Should be unambiguous and easy to use.
- Support existing filtering features and actions listed in `Filter types`_.
- Support packet alteration.
- In case of overlapping filters, their priority should be well documented.
Does that mean we don't guarantee the consistent of priority? The
priority can
be different on different NICs. So the behavior of the actions  can be
different.
quoted
quoted
Right?

No, the intent is precisely to define what happens in order to get a
consistent result across different devices, and document cases with
undefined behavior.
quoted
quoted
There must be no room left for interpretation.

For example, the API must describe what happens when two overlapping
filters (e.g. one matching an Ethernet header, another one matching
an IP header) match a given packet at a given priority level.

It is documented in section 4.1.1 (priorities) as "undefined behavior".
Applications remain free to do it and deal with consequences, at
least they know they cannot expect a consistent outcome, unless they
use different priority levels for both rules, see also 4.4.5 (flow rules priority).
quoted
Seems the users still need to aware the some details of the HW? Do
we need
to add the negotiation for the priority?

Priorities as defined in this document may not be directly mappable
to HW capabilities (e.g. HW does not support enough priorities, or
that some corner case make them not work as described), in which
case the PMD may choose to simulate priorities (again 4.4.5), as
long as the end result follows the specification.

So users must not be aware of some HW details, the PMD does and must
perform the needed workarounds to suit their expectations. Users may
only be impacted by errors while attempting to create rules that are
either unsupported or would cause them (or existing rules) to diverge from
the spec.
quoted
The problem is sometime the priority of the filters is fixed according
to
quoted
HW's implementation. For example, on ixgbe, n-tuple has a higher
priority than flow director.
As a side note I did not know that N-tuple had a higher priority than flow
director on ixgbe, priorities among filter types do not seem to be documented at
all in DPDK. This is one of the reasons I think we need a generic API to handle
flow configuration.
Totally agree with you. We haven't documented the info well enough. And even we do that, users have to study the details of every NIC, it can still make the filters very hard to use. I believe a generic API is very helpful here :)

So, today an application cannot combine N-tuple and FDIR flow rules and get a
reliable outcome, unless it is designed for specific devices with a known
behavior.
quoted
What's the right behavior of PMD if APP want to create a flow director rule
which has a higher or even equal priority than an existing n-tuple rule? Should
PMD return fail?

First remember applications only deal with the generic API, PMDs are
responsible for choosing the most appropriate HW implementation to use
according to the requested flow rules (FDIR, N-tuple or anything else).

For the specific case of FDIR vs N-tuple, if the underlying HW supports both I do
not see why the PMD would create a N-tuple rule. Doesn't FDIR support
everything N-tuple can do and much more?
Talking about the filters, fdir can cover n-tuple. I think that's why i40e only supports fdir but not n-tuple. But n-tuple has its own highlight. As we know, at least on intel NICs, fdir only supports per device mask. But n-tuple can support per rule mask.
As every pattern has spec and mask both, we cannot guarantee the masks are same. I think ixgbe will try to use n-tuple first if can. Because even the masks are different, we can support them all.
Assuming such a thing happened anyway, that the PMD had to create a rule
using a high priority filter type and that the application requests the creation of a
rule that can only be done using a lower priority filter type, but also requested a
higher priority for that rule, then yes, it should obviously fail.

That is, unless the PMD can perform some kind of workaround to have both.
quoted
If so, do we need more fail reasons? According to this RFC, I think we need
return " EEXIST: collision with an existing rule. ", but it's not very clear, APP
doesn't know the problem is priority, maybe more detailed reason is helpful.

Possibly, I've defined a basic set of errors, there are quite a number of errno
values to choose from. However I think we should not define too many values.
In my opinion the basic set covers every possible failure:

- EINVAL: invalid format, rule is broken or cannot be understood by the PMD
  anyhow.

- ENOTSUP: pattern/actions look fine but something in the requested rule is
  not supported and thus cannot be applied.

- EEXIST: pattern/actions are fine and could have been applied if only some
  other rule did not prevent the PMD to do it (I see it as the closest thing
  to "ETOOBAD" which unfortunately does not exist).

- ENOMEM: like EEXIST, except it is due to the lack of resources not because
  of another rule. I wasn't sure which of ENOMEM or ENOSPC was better but
  settled on ENOMEM as it is well known. Still open to debate.

Errno values are only useful to get a rough idea of the reason, and another
mechanism is needed to pinpoint the exact problem for debugging/reporting
purposes, something like:

 enum rte_flow_error_type {
     RTE_FLOW_ERROR_TYPE_NONE,
     RTE_FLOW_ERROR_TYPE_UNKNOWN,
     RTE_FLOW_ERROR_TYPE_PRIORITY,
     RTE_FLOW_ERROR_TYPE_PATTERN,
     RTE_FLOW_ERROR_TYPE_ACTION,
 };

 struct rte_flow_error {
     enum rte_flow_error_type type;
     void *offset; /* Points to the exact pattern item or action. */
     const char *message;
 };
When we are using a CLI and it fails, normally it will let us know which parameter is not appropriate. So, I think it’s a good idea to have this error structure :)
Then either provide an optional struct rte_flow_error pointer to
rte_flow_validate(), or a separate function (rte_flow_analyze()?), since
processing this may be quite expensive and applications may not care about the
exact reason.
Agree the processing may be too expensive. Maybe we can say it's optional to return error details. And that's a good question that what APP should do if creating the rule fails. I believe normally it will choose handle the rule by itself. But I think it's not bad to feedback more. Or even the APP want to adjust the rules, it cannot be an option for lack of info.
What do you suggest?
quoted
quoted
quoted
quoted
Behavior
--------

- API operations are synchronous and blocking (``EAGAIN`` cannot be
  returned).

- There is no provision for reentrancy/multi-thread safety, although
nothing
quoted
quoted
quoted
quoted
  should prevent different devices from being configured at the same
  time. PMDs may protect their control path functions accordingly.

- Stopping the data path (TX/RX) should not be necessary when
managing
flow
quoted
quoted
  rules. If this cannot be achieved naturally or with workarounds (such as
  temporarily replacing the burst function pointers), an appropriate error
  code must be returned (``EBUSY``).
PMD cannot stop the data path without adding lock. So I think if
some rules
cannot be applied without stopping rx/tx, PMD has to return fail.
quoted
Or let the APP to stop the data path.
Agreed, that is the intent. If the PMD cannot touch flow rules for
some reason even after trying really hard, then it just returns EBUSY.

Perhaps we should write down that applications may get a different
outcome after stopping the data path if they get EBUSY?
Agree, it's better to describe more about the APP. BTW, I checked the
behavior of ixgbe/igb, I think we can add/delete filters during
runtime. Hopefully we'll not hit too many EBUSY problems on other NICs
:)
OK, I will add it.
quoted
quoted
quoted
quoted
- PMDs, not applications, are responsible for maintaining flow rules
  configuration when stopping and restarting a port or performing other
  actions which may affect them. They can only be destroyed explicitly.
Don’t understand " They can only be destroyed explicitly."
This part says that as long as an application has not called
rte_flow_destroy() on a flow rule, it never disappears, whatever
happens to the port (stopped, restarted). The application is not
responsible for re-creating rules after that.

Note that according to the specification, this may translate to not
being able to stop a port as long as a flow rule is present,
depending on how nice the PMD intends to be with applications.
Implementation can be done in small steps with minimal amount of code on
the PMD side.
quoted
Does it mean PMD should store and maintain all the rules? Why not let rte do
that? I think if PMD maintain all the rules, it means every kind of NIC should have
a copy of code for the rules. But if rte do that, only one copy of code need to be
maintained, right?

I've considered having rules stored in a common format understood at the RTE
level and not specific to each PMD and decided that the opaque rte_flow pointer
was a better choice for the following reasons:

- Even though flow rules management is done in the control path, processing
  must be as fast as possible. Letting PMDs store flow rules using their own
  internal representation gives them the chance to achieve better
  performance.
Not quite understand. I think we're talking about maintain the rules by SW. I don’t think there's something need to be optimized according to specific NICs. If we need to optimize the code, I think we need to consider the CPU, OS ... and some common means. I'm wrong?
- An opaque context managed by PMDs would probably have to be stored
  somewhere as well anyway.

- PMDs may not need to allocate/store anything at all if they exclusively
  rely on HW state for everything. In my opinion, the generic API has enough
  constraints for this to work and maintain consistency between flow
  rules. Note this is currently how most PMDs implement FDIR and other
  filter types.
Yes, the rules are stored by HW. But considering stop/start the device, the rules in HW will lose. we have to store the rules by SW and re-program them when restarting the device.
And in existing code, we store the filters by SW at least on Intel NICs. But I think we cannot reuse them, because considering the priority and which category of filter should be chosen, I think we need a whole new table for generic API. I think it’s what's designed now, right?
- RTE can (and will) provide helpers to avoid most of the code redundancy,
  PMDs are free to use them or manage everything by themselves.

- Given that the opaque rte_flow pointer associated with a flow rule is to
  be stored by the application, PMDs do not even have to keep references to
  them.
Don’t understand. More details?
- The flow rules format described in this specification (pattern / actions)
  will be used by applications directly, and will be free to arrange them in
  lists, trees or in any other way if they need to keep flow specifications
  around for further processing.
Who will create the lists, trees or something else? According to previous discussion, I think the APP will program the rules one by one. So if APP organize the rules to lists, trees..., PMD doesn’t know that. 
And you said " Given that the opaque rte_flow pointer associated with a flow rule is to be stored by the application ". I'm lost here.
quoted
When the port is stopped and restarted, rte can reconfigure the rules. Is the
concern that PMD may adjust the sequence of the rules according to the priority,
so every NIC has a different list of rules? But PMD can adjust them again when
rte reconfiguring the rules.

What about PMDs able to stop and restart ports without destroying their own
flow rules? If we assume flow rules must be destroyed when stopping a port,
these PMDs are needlessly penalized with slower stop/start cycles. Think about
it assuming thousands of flow rules.
I believe the rules maintained by SW should not be destroyed, because they're used to be re-programed when the device starts again.
Thus from an application point of view, whatever happens when stopping and
restarting a port should not matter. If a flow rule was present before, it must
still be present afterwards. If the PMD had to destroy flow rules and re-create
them, it does not actually matter if they differ slightly at the HW level, as long as:

- Existing opaque flow rule pointers (rte_flow) are still valid to the PMD
  and refer to the same rules.

- The overall behavior of all rules is the same.

The list of rules you think of (patterns / actions) is maintained by applications
(not RTE), and only if they need them. RTE would needlessly duplicate this.
As said before, need more details to understand this. Maybe an example is better :)
--
Adrien Mazarguil
6WIND
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help