Re: [RFC] Generic flow director/filtering/classification API
From: Chandran, Sugesh <hidden>
Date: 2016-07-11 10:43:26
Hi Adrien, Thank you for your response, Please see my comments inline. Regards _Sugesh
-----Original Message----- From: Adrien Mazarguil [mailto:adrien.mazarguil@6wind.com] Sent: Friday, July 8, 2016 2:03 PM To: Chandran, Sugesh <redacted> Cc: dev@dpdk.org; Thomas Monjalon <redacted>; Zhang, Helin [off-list ref]; Wu, Jingjing [off-list ref]; Rasesh Mody [off-list ref]; Ajit Khaparde [off-list ref]; Rahul Lakkireddy [off-list ref]; Lu, Wenzhuo [off-list ref]; Jan Medala [off-list ref]; John Daley [off-list ref]; Chen, Jing D [off-list ref]; Ananyev, Konstantin [off-list ref]; Matej Vido [off-list ref]; Alejandro Lucero [off-list ref]; Sony Chacko [off-list ref]; Jerin Jacob [off-list ref]; De Lara Guarch, Pablo [off-list ref]; Olga Shern [off-list ref] Subject: Re: [dpdk-dev] [RFC] Generic flow director/filtering/classification API Hi Sugesh, On Thu, Jul 07, 2016 at 11:15:07PM +0000, Chandran, Sugesh wrote:quoted
Hi Adrien, Thank you for proposing this. It would be really useful for application suchas OVS-DPDK.quoted
Please find my comments and questions inline below prefixed with[Sugesh]. Most of them are from the perspective of enabling these APIs in application such as OVS-DPDK. Thanks, I'm replying below.quoted
quoted
-----Original Message----- From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of AdrienMazarguilquoted
quoted
Sent: Tuesday, July 5, 2016 7:17 PM To: dev@dpdk.org Cc: Thomas Monjalon <redacted>; Zhang, Helin [off-list ref]; Wu, Jingjing [off-list ref]; Rasesh Mody [off-list ref]; Ajit Khaparde [off-list ref]; Rahul Lakkireddy [off-list ref]; Lu, Wenzhuo[off-list ref];quoted
quoted
Jan Medala [off-list ref]; John Daley [off-list ref];Chen,quoted
quoted
Jing D [off-list ref]; Ananyev, Konstantin [off-list ref]; Matej Vido [off-list ref]; Alejandro Lucero [off-list ref]; Sony Chacko [off-list ref]; Jerin Jacob [off-list ref]; De Lara Guarch, Pablo [off-list ref]; Olga Shern [off-list ref] Subject: [dpdk-dev] [RFC] Generic flow director/filtering/classificationAPIquoted
quoted
<<<<<----Snipped out ---->>>>>
quoted
quoted
Flow director ------------- Flow director (FDIR) is the name of the most capable filter type, which covers most features offered by others. As such, it is the mostwidespreadquoted
quoted
in PMDs that support filtering (i.e. all of them besides **e1000**). It is also the only type that allows an arbitrary 32 bits value provided by applications to be attached to a filter and returned with matching packets instead of relying on the destination queue to recognize flows. Unfortunately, even FDIR requires applications to be aware of low-level capabilities and limitations (most of which come directly from **ixgbe**andquoted
quoted
**i40e**): - Bitmasks are set globally per device (port?), not per filter.[Sugesh] This means application cannot define filters that matches onarbitrary different offsets?quoted
If that’s the case, I assume the application has to program bitmask inadvance. Otherwise howquoted
the API framework deduce this bitmask information from the rules?? Itsnot very clear to mequoted
that how application pass down the bitmask information for multiple filterson same port? This is my understanding of how flow director currently works, perhaps someome more familiar with it can answer this question better than I could. Let me take an example, if particular device can only handle a single IPv4 mask common to all flow rules (say only to match destination addresses), updating that mask to also match the source address affects all defined and future flow rules simultaneously. That is how FDIR currently works and I think it is wrong, as it penalizes devices that do support individual bit-masks per rule, and is a little awkward from an application point of view. What I suggest for the new API instead is the ability to specify one bit-mask per rule, and let the PMD deal with HW limitations by automatically configuring global bitmasks from the first added rule, then refusing to add subsequent rules if they specify a conflicting bit-mask. Existing rules remain unaffected that way, and applications do not have to be extra cautious.
[Sugesh] The issue with that approach is, the hardware simply discards the rule when it is a super set of first one eventhough the hardware is capable of handling it. How its guaranteed the first rule will set the bitmask for all the subsequent rules. How about having a CLASSIFER_TYPE for the classifier. Every port can have set of supported flow types(for eg: L3_TYPE, L4_TYPE, L4_TYPE_8BYTE_FLEX, L4_TYPE_16BYTE_FLEX) based on the underlying FDIR support. Application can query this and set the type accordingly while initializing the port. This way the first rule need not set all the bits that may needed in the future rules.
quoted
quoted
``PASSTHRU`` ^^^^^^^^^^^^ Leaves packets up for additional processing by subsequent flow rules.Thisquoted
quoted
is the default when a rule does not contain a terminating action, but canbequoted
quoted
specified to force a rule to become non-terminating. - No configurable property. +---------------+ | PASSTHRU | +===============+ | no properties | +---------------+ Example to copy a packet to a queue and continue processing bysubsequentquoted
quoted
flow rules:[Sugesh] If a packet get copied to a queue, it’s a termination action. How can its possible to do subsequent action after the packet already moved to the queue. ?How it differs from DUP action? Am I missing anything here?Devices may not support the combination of QUEUE + PASSTHRU (i.e. making QUEUE non-terminating). However these same devices may expose the ability to copy a packet to another (sniffer) queue all while keeping the rule terminating (QUEUE + DUP but no PASSTHRU). DUP with two rules, assuming priorties and PASSTRHU are supported: - pattern X, priority 0; actions: QUEUE 5, PASSTHRU (non-terminating) - pattern X, priority 1; actions: QUEUE 6 (terminating) DUP with two actions on a single rule and a single priority: - pattern X, priority 0; actions: DUP 5, QUEUE 6 (terminating) If supported, from an application point of view the end result is similar in both cases (note the second case may be implemented by the PMD using two HW rules internally). However the second case does not waste a priority level and clearly states the intent to the PMD which is more likely to be supported. If HW supports DUP directly it is even faster since there is a single rule. That is why I thought having DUP as an action would be useful.
[Sugesh] Thank you for the clarification. It make sense to me now.
quoted
quoted
+--------------------------+ | Copy to queue 8 | +==========+===============+ | PASSTHRU | | +----------+-----------+---+ | QUEUE | ``queue`` | 8 | +----------+-----------+---+ ``ID`` ^^^^^^ Attaches a 32 bit value to packets. +----------------------------------------------+ | ID | +========+=====================================+ | ``id`` | 32 bit value to return with packets | +--------+-------------------------------------+[Sugesh] I assume the application has to program the flow with a unique ID and matching packets are stamped with this ID when reporting to the software. The uniqueness of ID is NOT guaranteed by the API framework. Correct me if I am wrong here.You are right, if the way I wrote it is not clear enough, I'm open to suggestions to improve it.
[Sugesh] I guess its fine and would like to confirm the same. Perhaps it would be nice to mention that the IDs are application defined.
quoted
[Sugesh] Is it a limitation to use only 32 bit ID? Is it possible to have a 64 bit ID? So that application can use the control plane flow pointer Itself as an ID. Does it make sense?I've specified a 32 bit ID for now because this is what FDIR supports and also what existing devices can report today AFAIK (i40e and mlx5). We could use 64 bit for future-proofness in a separate action like "ID64" when at least one device supports it. To PMD maintainers: please comment if you know devices that support tagging matching packets with more than 32 bits of user-provided data!
[Sugesh] I guess the flow director ID is 64 bit , The XL710 datasheet says so. And in the 'rte_mbuf' structure the 64 bit FDIR-ID is shared with rss hash. This can be a software driver limitation that expose only 32 bit. Possibly because of cache alignment issues? Since the hardware can support 64 bit, I feel it make sense to support 64 bit as well.
quoted
quoted
.. raw:: pdf PageBreak ``QUEUE`` ^^^^^^^^^ Assigns packets to a given queue index. - Terminating by default. +--------------------------------+ | QUEUE | +===========+====================+ | ``queue`` | queue index to use | +-----------+--------------------+ ``DROP`` ^^^^^^^^ Drop packets. - No configurable property. - Terminating by default. - PASSTHRU overrides this action if both are specified. +---------------+ | DROP | +===============+ | no properties | +---------------+ ``COUNT`` ^^^^^^^^^[Sugesh] Should we really have to set count action explicitly for every rule? IMHO it would be great to be an implicit action. Most of the applicationwould bequoted
interested in the stats of almost all the filters/flows .I can see why, but no, it must be explicitly requested because you may want to know in advance when it is not supported. Also considering it is something else to be done by HW (a separate action), we can assume enabling this may slow things down a bit. HW limitations may also prevent you from having as many flow counters as you want, in which case you probably want to carefully pick which rules have them. I think this target is most useful with DROP, VF and PF actions since those are currently the only ones where SW may not see the related packets.
[Sugesh] Agreed and thanks for the clarification.
quoted
quoted
Enables hits counter for this rule. This counter can be retrieved and reset through ``rte_flow_query()``, see ``struct rte_flow_query_count``. - Counters can be retrieved with ``rte_flow_query()``. - No configurable property. +---------------+ | COUNT | +===============+ | no properties | +---------------+ Query structure to retrieve and reset the flow rule hits counter: +------------------------------------------------+ | COUNT query | +===========+=====+==============================+ | ``reset`` | in | reset counter after query | +-----------+-----+------------------------------+ | ``hits`` | out | number of hits for this flow | +-----------+-----+------------------------------+
<<<<<<<<Snipped out >>>>
quoted
quoted
:: struct rte_flow * rte_flow_create(uint8_t port_id, const struct rte_flow_pattern *pattern, const struct rte_flow_actions *actions); Arguments: - ``port_id``: port identifier of Ethernet device. - ``pattern``: pattern specification to add. - ``actions``: actions associated with the flow definition. Return value: A valid flow pointer in case of success, NULL otherwise and ``rte_errno`` is set to the positive version of one of the error codes defined for ``rte_flow_validate()``.[Sugesh] : Kind of implementation specific query. What if application try to add duplicate rules? Does the API create new flow entry for every API call?If an application adds duplicate rules at a given priority level, the second one may return an error depending on the PMD. Collisions are sometimes trivial to detect (such as the same pattern twice), others not so much (one matching an Ethernet header only, the other one matching an IP header only). Either way if a packet is matched by two rules at a given priority level, what happens is described in 3.3 (High level design) and 4.4.1 (Priorities). Applications are responsible for not relying on the PMD to detect these, or should use a single priority level for each rule to make things clear. However since the number of HW priority levels is finite and possibly small, they must also make sure not to waste them. My advice is to only use priority levels when it cannot be proven that rules do not collide. If all you have is perfect matching rules without wildcards and all of them match the same number of layers, a single priority level is fine.
[Sugesh] Make sense. Its fine from my prespective.
quoted
[Sugesh] Another concern is the cost and time of installing these rules in the hardware. Can we make these APIs time bound(or at least an optiontoquoted
set the time limit to execute these APIs), so that Application doesn’t have to wait so long when installing and deleting flowswithquoted
slow hardware/NIC. What do you think? Most of the datapath flowinstallations arequoted
dynamic and triggered only when there is an ingress traffic. Delay in flow insertion/deletion have unpredictableconsequences. This API is (currently) aimed at the control path only, and must indeed be assumed to be slow. Creating million of rules may take quite long as it may involve syscalls and other time-consuming synchronization things on the PMD side. So currently there is no plan to have rules added from the data path with time constraints. I think it would be implemented through a different set of functions anyway. I do not think adding time limits is practical, even specifying in the API that creating a single flow rule must take less than a maximum number of seconds in order to be effective is too much of a constraint (applications that create all flows during init may not care after all). You should consider in any case that modifying flow rules will always be slower than receiving packets, there is no way around that. Applications have to live with it and provide a software fallback for incoming packets while managing flow rules. Moreover, think about what happens when you hit the maximum number of flow rules and cannot create any more. Applications need to implement some kind of fallback in their data path. Offloading flows in HW is also only useful if they live much longer than the time taken to create and delete them. Perhaps applications may choose to do so after detecting long lived flows such as TCP sessions. You may have one separate control thread dedicated to manage flows and keep your normal control thread unaffected by delays. Several threads can even be dedicated, one per device.
[Sugesh] I agree that the flow insertion cannot be as fast as the packet receiving rate. From application point of view the problem will be when hardware flow insertion takes longer than software flow insertion. At least application has to know the cost of inserting/deleting a rule in hardware beforehand. Otherwise how application can choose the right flow candidate for hardware. My point here is application is expecting a deterministic behavior from a classifier while inserting and deleting rules.
quoted
[Sugesh] Another query is on the synchronization part. What if same rulesarequoted
handled from different threads? Is application responsible for handling theconcurrentquoted
hardware programming?Like most (if not all) DPDK APIs, applications are responsible for managing locking issues as decribed in 4.3 (Behavior). Since this is a control path API and applications usually have a single control thread, locking should not be necessary in most cases. Regarding my above comment about using several control threads to manage different devices, section 4.3 says: "There is no provision for reentrancy/multi-thread safety, although nothing should prevent different devices from being configured at the same time. PMDs may protect their control path functions accordingly." I'd like to emphasize it is not "per port" but "per device", since in a few cases a configurable resource is shared by several ports. It may be difficult for applications to determine which ports are shared by a given device but this falls outside the scope of this API. Do you think adding the guarantee that it is always safe to configure two different ports simultaneously without locking from the application side is necessary? In which case the PMD would be responsible for locking shared resources.
[Sugesh] This would be little bit complicated when some of ports are not under DPDK itself(what if one port is managed by Kernel) Or ports are tied by different application. Locking in PMD helps when the ports are accessed by multiple DPDK application. However what if the port itself not under DPDK?
quoted
quoted
Destruction ~~~~~~~~~~~ Flow rules destruction is not automatic, and a queue should not bereleasedquoted
quoted
if any are still attached to it. Applications must take care of performing this step before releasing resources. :: int rte_flow_destroy(uint8_t port_id, struct rte_flow *flow);[Sugesh] I would suggest having a clean-up API is really useful as thereleasing ofquoted
Queue(is it applicable for releasing of port too?) is not guaranteeing theautomatic flowquoted
destruction.Would something like rte_flow_flush(port_id) do the trick? I wanted to emphasize in this first draft that applications should really keep the flow pointers around in order to manage/destroy them. It is their responsibility, not PMD's.
[Sugesh] Thanks, I think the flush call will do.
quoted
This way application can initialize the port, clean-up all the existing rules and create new rules on a clean slate.No resource can be released as long as a flow rule is using it (bad things may happen otherwise), all flow rules must be destroyed first, thus none can possibly remain after initializing a port. It is assumed that PMDs do automatic clean up during init if necessary to ensure this.
[Sugesh] That will do.
quoted
quoted
Failure to destroy a flow rule may occur when other flow rules depend onit,quoted
quoted
and destroying it would result in an inconsistent state. This function is only guaranteed to succeed if flow rules are destroyed in reverse order of their creation. Arguments: - ``port_id``: port identifier of Ethernet device. - ``flow``: flow rule to destroy. Return value: - **0** on success, a negative errno value otherwise and ``rte_errno`` is set. .. raw:: pdf PageBreak Query ~~~~~ Query an existing flow rule. This function allows retrieving flow-specific data such as counters. Data is gathered by special actions which must be present in the flow rule definition. :: int rte_flow_query(uint8_t port_id, struct rte_flow *flow, enum rte_flow_action_type action, void *data); Arguments: - ``port_id``: port identifier of Ethernet device. - ``flow``: flow rule to query. - ``action``: action type to query. - ``data``: pointer to storage for the associated query data type. Return value: - **0** on success, a negative errno value otherwise and ``rte_errno`` is set. .. raw:: pdf PageBreak Behavior -------- - API operations are synchronous and blocking (``EAGAIN`` cannot be returned). - There is no provision for reentrancy/multi-thread safety, althoughnothingquoted
quoted
should prevent different devices from being configured at the same time. PMDs may protect their control path functions accordingly. - Stopping the data path (TX/RX) should not be necessary when managing flow rules. If this cannot be achieved naturally or with workarounds (such as temporarily replacing the burst function pointers), an appropriate error code must be returned (``EBUSY``). - PMDs, not applications, are responsible for maintaining flow rules configuration when stopping and restarting a port or performing other actions which may affect them. They can only be destroyed explicitly. .. raw:: pdf PageBreak[Sugesh] Query all the rules for a specific port/queue?? Useful whenadding andquoted
deleting ports and queues dynamically according to the need. I am not sure what are the other different usecases for these APIs. But I feel it makesmuch easier toquoted
manage flows from the application. What do you think?Not sure, that seems to fall out of the scope of this API. As described, applications already store the related rte_flow pointers. Accordingly, they know how many rules are associated to a given port. They need both a port ID and a flow rule pointer to destroy them after all. Now perhaps something to convert back an existing rte_flow to a pattern and a list of actions, however I cannot see an immediate use case for it. What you describe seems to be doable through a front-end API, I think keeping this one as low-level as possible with only basic actions is better right now. I'll keep your suggestion in mind.
[Sugesh] Sure, That will be fine.
quoted
quoted
Compatibility ------------- No known hardware implementation supports all the features describedinquoted
quoted
this document. Unsupported features or combinations are not expected to be fully emulated in software by PMDs for performance reasons. Partially supportedfeaturesquoted
quoted
may be completed in software as long as hardware performs most of the work (such as queue redirection and packet recognition). However PMDs are expected to do their best to satisfy applicationrequestsquoted
quoted
by working around hardware limitations as long as doing so does notaffectquoted
quoted
the behavior of existing flow rules. The following sections provide a few examples of such cases, they arebased Adrien Mazarguil 6WIND