Thread (260 messages) 260 messages, 21 authors, 2017-11-14

Re: [RFC] Generic flow director/filtering/classification API

From: Chandran, Sugesh <hidden>
Date: 2016-07-11 10:43:26

Hi Adrien,

Thank you for your response,
Please see my comments inline.

Regards
_Sugesh

-----Original Message-----
From: Adrien Mazarguil [mailto:adrien.mazarguil@6wind.com]
Sent: Friday, July 8, 2016 2:03 PM
To: Chandran, Sugesh <redacted>
Cc: dev@dpdk.org; Thomas Monjalon <redacted>;
Zhang, Helin [off-list ref]; Wu, Jingjing
[off-list ref]; Rasesh Mody [off-list ref]; Ajit
Khaparde [off-list ref]; Rahul Lakkireddy
[off-list ref]; Lu, Wenzhuo [off-list ref];
Jan Medala [off-list ref]; John Daley [off-list ref]; Chen,
Jing D [off-list ref]; Ananyev, Konstantin
[off-list ref]; Matej Vido [off-list ref];
Alejandro Lucero [off-list ref]; Sony Chacko
[off-list ref]; Jerin Jacob
[off-list ref]; De Lara Guarch, Pablo
[off-list ref]; Olga Shern [off-list ref]
Subject: Re: [dpdk-dev] [RFC] Generic flow director/filtering/classification
API

Hi Sugesh,

On Thu, Jul 07, 2016 at 11:15:07PM +0000, Chandran, Sugesh wrote:
quoted
Hi Adrien,

Thank you for proposing this. It would be really useful for application such
as OVS-DPDK.
quoted
Please find my comments and questions inline below prefixed with
[Sugesh]. Most of them are from the perspective of enabling these APIs in
application such as OVS-DPDK.

Thanks, I'm replying below.
quoted
quoted
-----Original Message-----
From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Adrien
Mazarguil
quoted
quoted
Sent: Tuesday, July 5, 2016 7:17 PM
To: dev@dpdk.org
Cc: Thomas Monjalon <redacted>; Zhang, Helin
[off-list ref]; Wu, Jingjing [off-list ref]; Rasesh
Mody [off-list ref]; Ajit Khaparde
[off-list ref]; Rahul Lakkireddy
[off-list ref]; Lu, Wenzhuo
[off-list ref];
quoted
quoted
Jan Medala [off-list ref]; John Daley [off-list ref];
Chen,
quoted
quoted
Jing D [off-list ref]; Ananyev, Konstantin
[off-list ref]; Matej Vido [off-list ref];
Alejandro Lucero [off-list ref]; Sony Chacko
[off-list ref]; Jerin Jacob
[off-list ref]; De Lara Guarch, Pablo
[off-list ref]; Olga Shern [off-list ref]
Subject: [dpdk-dev] [RFC] Generic flow director/filtering/classification
API
quoted
quoted
<<<<<----Snipped out ---->>>>>
quoted
quoted
Flow director
-------------

Flow director (FDIR) is the name of the most capable filter type, which
covers most features offered by others. As such, it is the most
widespread
quoted
quoted
in PMDs that support filtering (i.e. all of them besides **e1000**).

It is also the only type that allows an arbitrary 32 bits value provided by
applications to be attached to a filter and returned with matching packets
instead of relying on the destination queue to recognize flows.

Unfortunately, even FDIR requires applications to be aware of low-level
capabilities and limitations (most of which come directly from **ixgbe**
and
quoted
quoted
**i40e**):

- Bitmasks are set globally per device (port?), not per filter.
[Sugesh] This means application cannot define filters that matches on
arbitrary different offsets?
quoted
If that’s the case, I assume the application has to program bitmask in
advance. Otherwise how
quoted
the API framework deduce this bitmask information from the rules?? Its
not very clear to me
quoted
that how application pass down the bitmask information for multiple filters
on same port?

This is my understanding of how flow director currently works, perhaps
someome more familiar with it can answer this question better than I could.

Let me take an example, if particular device can only handle a single IPv4
mask common to all flow rules (say only to match destination addresses),
updating that mask to also match the source address affects all defined and
future flow rules simultaneously.

That is how FDIR currently works and I think it is wrong, as it penalizes
devices that do support individual bit-masks per rule, and is a little
awkward from an application point of view.

What I suggest for the new API instead is the ability to specify one
bit-mask per rule, and let the PMD deal with HW limitations by automatically
configuring global bitmasks from the first added rule, then refusing to add
subsequent rules if they specify a conflicting bit-mask. Existing rules
remain unaffected that way, and applications do not have to be extra
cautious.
[Sugesh] The issue with that approach is, the hardware simply discards the rule
when it is a super set of first one eventhough the hardware is capable of 
handling it. How its guaranteed the first rule will set the bitmask for all the
subsequent rules. 
How about having a CLASSIFER_TYPE for the classifier. Every port can have 
set of supported flow types(for eg: L3_TYPE, L4_TYPE, L4_TYPE_8BYTE_FLEX,
L4_TYPE_16BYTE_FLEX) based on the underlying FDIR support. Application can query 
this and set the type accordingly while initializing the port. This way the first rule need 
not set all the bits that may needed in the future rules. 
 
quoted
quoted
``PASSTHRU``
^^^^^^^^^^^^

Leaves packets up for additional processing by subsequent flow rules.
This
quoted
quoted
is the default when a rule does not contain a terminating action, but can
be
quoted
quoted
specified to force a rule to become non-terminating.

- No configurable property.

+---------------+
| PASSTHRU      |
+===============+
| no properties |
+---------------+

Example to copy a packet to a queue and continue processing by
subsequent
quoted
quoted
flow rules:
[Sugesh] If a packet get copied to a queue, it’s a termination action.
How can its possible to do subsequent action after the packet already
moved to the queue. ?How it differs from DUP action?
 Am I missing anything here?
Devices may not support the combination of QUEUE + PASSTHRU (i.e. making
QUEUE non-terminating). However these same devices may expose the
ability to
copy a packet to another (sniffer) queue all while keeping the rule
terminating (QUEUE + DUP but no PASSTHRU).

DUP with two rules, assuming priorties and PASSTRHU are supported:

- pattern X, priority 0; actions: QUEUE 5, PASSTHRU (non-terminating)

- pattern X, priority 1; actions: QUEUE 6 (terminating)

DUP with two actions on a single rule and a single priority:

- pattern X, priority 0; actions: DUP 5, QUEUE 6 (terminating)

If supported, from an application point of view the end result is similar in
both cases (note the second case may be implemented by the PMD using
two HW
rules internally).

However the second case does not waste a priority level and clearly states
the intent to the PMD which is more likely to be supported. If HW supports
DUP directly it is even faster since there is a single rule. That is why I
thought having DUP as an action would be useful.
[Sugesh] Thank you for the clarification. It make sense to me now.
quoted
quoted
+--------------------------+
| Copy to queue 8          |
+==========+===============+
| PASSTHRU |               |
+----------+-----------+---+
| QUEUE    | ``queue`` | 8 |
+----------+-----------+---+

``ID``
^^^^^^

Attaches a 32 bit value to packets.

+----------------------------------------------+
| ID                                           |
+========+=====================================+
| ``id`` | 32 bit value to return with packets |
+--------+-------------------------------------+
[Sugesh] I assume the application has to program the flow
with a unique ID and matching packets are stamped with this ID
when reporting to the software. The uniqueness of ID is NOT
guaranteed by the API framework. Correct me if I am wrong here.
You are right, if the way I wrote it is not clear enough, I'm open to
suggestions to improve it.
[Sugesh] I guess its fine and would like to confirm the same. Perhaps
it would be nice to mention that the IDs are application defined.
quoted
[Sugesh] Is it a limitation to use only 32 bit ID? Is it possible to have a
64 bit ID? So that application can use the control plane flow pointer
Itself as an ID. Does it make sense?
I've specified a 32 bit ID for now because this is what FDIR supports and
also what existing devices can report today AFAIK (i40e and mlx5).

We could use 64 bit for future-proofness in a separate action like "ID64"
when at least one device supports it.

To PMD maintainers: please comment if you know devices that support
tagging
matching packets with more than 32 bits of user-provided data!
[Sugesh] I guess the flow director ID is 64 bit , The XL710 datasheet says so.
And in the 'rte_mbuf' structure the 64 bit FDIR-ID is shared with rss hash. This can be
a software driver limitation that expose only 32 bit. Possibly because of cache 
alignment issues? Since the hardware can support 64 bit, I feel it make sense 
to support 64 bit as well.
quoted
quoted
.. raw:: pdf

   PageBreak

``QUEUE``
^^^^^^^^^

Assigns packets to a given queue index.

- Terminating by default.

+--------------------------------+
| QUEUE                          |
+===========+====================+
| ``queue`` | queue index to use |
+-----------+--------------------+

``DROP``
^^^^^^^^

Drop packets.

- No configurable property.
- Terminating by default.
- PASSTHRU overrides this action if both are specified.

+---------------+
| DROP          |
+===============+
| no properties |
+---------------+

``COUNT``
^^^^^^^^^
[Sugesh] Should we really have to set count action explicitly for every rule?
IMHO it would be great to be an implicit action. Most of the application
would be
quoted
interested in the stats of almost all the filters/flows .
I can see why, but no, it must be explicitly requested because you may want
to know in advance when it is not supported. Also considering it is
something else to be done by HW (a separate action), we can assume
enabling
this may slow things down a bit.

HW limitations may also prevent you from having as many flow counters as
you
want, in which case you probably want to carefully pick which rules have
them.

I think this target is most useful with DROP, VF and PF actions since
those are currently the only ones where SW may not see the related
packets.
[Sugesh] Agreed and thanks for the clarification.
quoted
quoted
Enables hits counter for this rule.

This counter can be retrieved and reset through ``rte_flow_query()``, see
``struct rte_flow_query_count``.

- Counters can be retrieved with ``rte_flow_query()``.
- No configurable property.

+---------------+
| COUNT         |
+===============+
| no properties |
+---------------+

Query structure to retrieve and reset the flow rule hits counter:

+------------------------------------------------+
| COUNT query                                    |
+===========+=====+==============================+
| ``reset`` | in  | reset counter after query    |
+-----------+-----+------------------------------+
| ``hits``  | out | number of hits for this flow |
+-----------+-----+------------------------------+
<<<<<<<<Snipped out >>>>
quoted
quoted
::

 struct rte_flow *
 rte_flow_create(uint8_t port_id,
                 const struct rte_flow_pattern *pattern,
                 const struct rte_flow_actions *actions);

Arguments:

- ``port_id``: port identifier of Ethernet device.
- ``pattern``: pattern specification to add.
- ``actions``: actions associated with the flow definition.

Return value:

A valid flow pointer in case of success, NULL otherwise and ``rte_errno`` is
set to the positive version of one of the error codes defined for
``rte_flow_validate()``.
[Sugesh] : Kind of implementation specific query. What if application
try to add duplicate rules? Does the API create new flow entry for every
API call?
If an application adds duplicate rules at a given priority level, the second
one may return an error depending on the PMD. Collisions are sometimes
trivial to detect (such as the same pattern twice), others not so much (one
matching an Ethernet header only, the other one matching an IP header
only).

Either way if a packet is matched by two rules at a given priority level,
what happens is described in 3.3 (High level design) and 4.4.1 (Priorities).

Applications are responsible for not relying on the PMD to detect these, or
should use a single priority level for each rule to make things clear.

However since the number of HW priority levels is finite and possibly small,
they must also make sure not to waste them. My advice is to only use
priority levels when it cannot be proven that rules do not collide.

If all you have is perfect matching rules without wildcards and all of them
match the same number of layers, a single priority level is fine.
[Sugesh] Make sense. Its fine from my prespective.
quoted
[Sugesh] Another concern is the cost and time of installing these rules
in the hardware. Can we make these APIs time bound(or at least an option
to
quoted
set the time limit to execute these APIs), so that
Application doesn’t have to wait so long when installing and deleting flows
with
quoted
slow hardware/NIC. What do you think? Most of the datapath flow
installations are
quoted
dynamic and triggered only when there is
an ingress traffic. Delay in flow insertion/deletion have unpredictable
consequences.

This API is (currently) aimed at the control path only, and must indeed be
assumed to be slow. Creating million of rules may take quite long as it may
involve syscalls and other time-consuming synchronization things on the
PMD
side.

So currently there is no plan to have rules added from the data path with
time constraints. I think it would be implemented through a different set of
functions anyway.

I do not think adding time limits is practical, even specifying in the API
that creating a single flow rule must take less than a maximum number of
seconds in order to be effective is too much of a constraint (applications
that create all flows during init may not care after all).

You should consider in any case that modifying flow rules will always be
slower than receiving packets, there is no way around that. Applications
have to live with it and provide a software fallback for incoming packets
while managing flow rules.

Moreover, think about what happens when you hit the maximum number of
flow
rules and cannot create any more. Applications need to implement some
kind
of fallback in their data path.

Offloading flows in HW is also only useful if they live much longer than the
time taken to create and delete them. Perhaps applications may choose to
do
so after detecting long lived flows such as TCP sessions.

You may have one separate control thread dedicated to manage flows and
keep your normal control thread unaffected by delays. Several threads can
even be dedicated, one per device.
[Sugesh] I agree that the flow insertion cannot be as fast as the packet receiving 
rate.  From application point of view the problem will be when hardware flow 
insertion takes longer than software flow insertion. At least application has to know
the cost of inserting/deleting a rule in hardware beforehand. Otherwise how application
can choose the right flow candidate for hardware. My point here is application is expecting 
a deterministic behavior from a classifier while inserting and deleting rules.
quoted
[Sugesh] Another query is on the synchronization part. What if same rules
are
quoted
handled from different threads? Is application responsible for handling the
concurrent
quoted
hardware programming?
Like most (if not all) DPDK APIs, applications are responsible for managing
locking issues as decribed in 4.3 (Behavior). Since this is a control path
API and applications usually have a single control thread, locking should
not be necessary in most cases.

Regarding my above comment about using several control threads to
manage
different devices, section 4.3 says:

 "There is no provision for reentrancy/multi-thread safety, although nothing
 should prevent different devices from being configured at the same
 time. PMDs may protect their control path functions accordingly."

I'd like to emphasize it is not "per port" but "per device", since in a few
cases a configurable resource is shared by several ports. It may be
difficult for applications to determine which ports are shared by a given
device but this falls outside the scope of this API.

Do you think adding the guarantee that it is always safe to configure two
different ports simultaneously without locking from the application side is
necessary? In which case the PMD would be responsible for locking shared
resources.
[Sugesh] This would be little bit complicated when some of ports are not under 
DPDK itself(what if one port is managed by Kernel) Or ports are tied by 
different application. Locking in PMD helps when the ports are accessed by 
multiple DPDK application. However what if the port itself not under DPDK?
quoted
quoted
Destruction
~~~~~~~~~~~

Flow rules destruction is not automatic, and a queue should not be
released
quoted
quoted
if any are still attached to it. Applications must take care of performing
this step before releasing resources.

::

 int
 rte_flow_destroy(uint8_t port_id,
                  struct rte_flow *flow);
[Sugesh] I would suggest having a clean-up API is really useful as the
releasing of
quoted
Queue(is it applicable for releasing of port too?) is not guaranteeing the
automatic flow
quoted
destruction.
Would something like rte_flow_flush(port_id) do the trick? I wanted to
emphasize in this first draft that applications should really keep the flow
pointers around in order to manage/destroy them. It is their responsibility,
not PMD's.
[Sugesh] Thanks, I think the flush call will do.
quoted
This way application can initialize the port,
clean-up all the existing rules and create new rules  on a clean slate.
No resource can be released as long as a flow rule is using it (bad things
may happen otherwise), all flow rules must be destroyed first, thus none can
possibly remain after initializing a port. It is assumed that PMDs do
automatic clean up during init if necessary to ensure this.
[Sugesh] That will do.
quoted
quoted
Failure to destroy a flow rule may occur when other flow rules depend on
it,
quoted
quoted
and destroying it would result in an inconsistent state.

This function is only guaranteed to succeed if flow rules are destroyed in
reverse order of their creation.

Arguments:

- ``port_id``: port identifier of Ethernet device.
- ``flow``: flow rule to destroy.

Return value:

- **0** on success, a negative errno value otherwise and ``rte_errno`` is
  set.

.. raw:: pdf

   PageBreak

Query
~~~~~

Query an existing flow rule.

This function allows retrieving flow-specific data such as counters. Data
is gathered by special actions which must be present in the flow rule
definition.

::

 int
 rte_flow_query(uint8_t port_id,
                struct rte_flow *flow,
                enum rte_flow_action_type action,
                void *data);

Arguments:

- ``port_id``: port identifier of Ethernet device.
- ``flow``: flow rule to query.
- ``action``: action type to query.
- ``data``: pointer to storage for the associated query data type.

Return value:

- **0** on success, a negative errno value otherwise and ``rte_errno`` is
  set.

.. raw:: pdf

   PageBreak

Behavior
--------

- API operations are synchronous and blocking (``EAGAIN`` cannot be
  returned).

- There is no provision for reentrancy/multi-thread safety, although
nothing
quoted
quoted
  should prevent different devices from being configured at the same
  time. PMDs may protect their control path functions accordingly.

- Stopping the data path (TX/RX) should not be necessary when managing
flow
  rules. If this cannot be achieved naturally or with workarounds (such as
  temporarily replacing the burst function pointers), an appropriate error
  code must be returned (``EBUSY``).

- PMDs, not applications, are responsible for maintaining flow rules
  configuration when stopping and restarting a port or performing other
  actions which may affect them. They can only be destroyed explicitly.

.. raw:: pdf

   PageBreak
[Sugesh] Query all the rules for a specific port/queue?? Useful when
adding and
quoted
deleting ports and queues dynamically according to the need. I am not sure
what are the other  different usecases for these APIs. But I feel it makes
much easier to
quoted
manage flows from the application. What do you think?
Not sure, that seems to fall out of the scope of this API. As described,
applications already store the related rte_flow pointers. Accordingly, they
know how many rules are associated to a given port. They need both a port
ID
and a flow rule pointer to destroy them after all.

Now perhaps something to convert back an existing rte_flow to a pattern
and
a list of actions, however I cannot see an immediate use case for it.

What you describe seems to be doable through a front-end API, I think
keeping this one as low-level as possible with only basic actions is better
right now. I'll keep your suggestion in mind.
[Sugesh] Sure, That will be fine.
quoted
quoted
Compatibility
-------------

No known hardware implementation supports all the features described
in
quoted
quoted
this
document.

Unsupported features or combinations are not expected to be fully
emulated
in software by PMDs for performance reasons. Partially supported
features
quoted
quoted
may be completed in software as long as hardware performs most of the
work
(such as queue redirection and packet recognition).

However PMDs are expected to do their best to satisfy application
requests
quoted
quoted
by working around hardware limitations as long as doing so does not
affect
quoted
quoted
the behavior of existing flow rules.

The following sections provide a few examples of such cases, they are
based
Adrien Mazarguil
6WIND
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help