Re: [RFC] Generic flow director/filtering/classification API
From: Chandran, Sugesh <hidden>
Date: 2016-07-18 13:26:27
Hi Adrien, Thank you for getting back on this. Please find my comments below. Regards _Sugesh
-----Original Message----- From: Adrien Mazarguil [mailto:adrien.mazarguil@6wind.com] Sent: Friday, July 15, 2016 4:04 PM To: Chandran, Sugesh <redacted> Cc: dev@dpdk.org; Thomas Monjalon <redacted>; Zhang, Helin [off-list ref]; Wu, Jingjing [off-list ref]; Rasesh Mody [off-list ref]; Ajit Khaparde [off-list ref]; Rahul Lakkireddy [off-list ref]; Lu, Wenzhuo [off-list ref]; Jan Medala [off-list ref]; John Daley [off-list ref]; Chen, Jing D [off-list ref]; Ananyev, Konstantin [off-list ref]; Matej Vido [off-list ref]; Alejandro Lucero [off-list ref]; Sony Chacko [off-list ref]; Jerin Jacob [off-list ref]; De Lara Guarch, Pablo [off-list ref]; Olga Shern [off-list ref]; Chilikin, Andrey [off-list ref] Subject: Re: [dpdk-dev] [RFC] Generic flow director/filtering/classification API On Fri, Jul 15, 2016 at 09:23:26AM +0000, Chandran, Sugesh wrote:quoted
Thank you Adrien, Please find below for some more comments/inputs Let me know your thoughts on this.Thanks, stripping again non relevant parts. [...]quoted
quoted
quoted
quoted
quoted
[Sugesh] Is it a limitation to use only 32 bit ID? Is it possible to have a 64 bit ID? So that application can use the control plane flow pointer Itself as an ID. Does it make sense?I've specified a 32 bit ID for now because this is what FDIR supports and also what existing devices can report today AFAIK (i40e andmlx5).quoted
quoted
We could use 64 bit for future-proofness in a separate action like"ID64"quoted
quoted
quoted
quoted
when at least one device supports it. To PMD maintainers: please comment if you know devices that support tagging matching packets with more than 32 bits of user-provided data![Sugesh] I guess the flow director ID is 64 bit , The XL710 datasheet saysso.quoted
quoted
quoted
And in the 'rte_mbuf' structure the 64 bit FDIR-ID is shared with rss hash. This can be a software driver limitation that expose only 32 bit. Possibly because of cache alignment issues? Since the hardware can support 64 bit, I feel it make sense to support 64 bit aswell.quoted
quoted
I agree we need 64 bit support, but then we also need a solution for devices that support only 32 bit. Possible methods I can think of: - A separate "ID64" action (or a "ID32" one, perhaps with a better name). - A single ID action with an unlimited number of bytes to return with packets (would actually be a string). PMDs can then refuse to createflowquoted
quoted
rules requesting an unsupported number of bytes. Devices supporting fewer than 32 bits are also included this way without the need for yet another action. Thoughts?[Sugesh] I feel the single ID approach is much better. But I would say a fixed size ID is easy to handle at upper layers. Say PMD returns 64bit ID in which MSBs are masked out, based on how many bits thehardware can support.quoted
PMD can refuse the unsupported number of bytes when requested. Sothequoted
size of ID going to be a parameter to program the flow. What do you think?What you suggest if I am not mistaken is: struct rte_flow_action_id { uint64_t id; uint64_t mask; /* either a bit-mask or a prefix/suffix length? */ }; I think in this case a mask is more versatile than a prefix/suffix length as the value itself comes in an unknown endian (from PMD's POV). It also allows specific bits to be taken into account, like when HW only supports 32 bit, with some black magic the full original 64 bit value can be restored as long as the application only cares about at most 32 bits anywhere. However I do not think many applications "won't care" about specific bits in a given value and having to provide a properly crafted mask will be a hassle, they will just fill it with ones and hope for the best. As a result they won't take advantage of this feature or will stick to 32 bits all the time, or whatever happens to be the least common denominator. My previous suggestion was: struct rte_flow_action_id { uint8_t size; /* number of bytes in id[] */ uint8_t id[]; }; It does not solve the issue if an application requests more bytes than supported, however as a string, there is no endianness ambiguity and these bytes are copied as-is to the related mbuf field as if done through memcpy() possibly with some padding to fill the entire 64 bit field (copied bytes thus starting from MSB for big-endian machines, LSB for little-endian ones). The value itself remains opaque to the PMD. One issue is the flexible array approach makes static initialization more complicated. Maybe it is not worth the trouble since according to Andrey, even X710 reports at most 32 bits of user data. So what should we do? Fixed 32 bits ID for now to keep things simple, then another action for 64 bits later when necessary?
[Sugesh] I agree with you. We could keep things simple by having 32 bit ID now. I mixed up the size of ID with flexible payload size. Sorry about that. In the future, we could add an action for 64 bit if necessary.
quoted
quoted
[...]quoted
quoted
quoted
[Sugesh] Another concern is the cost and time of installing these rules in the hardware. Can we make these APIs time bound(or at least an optiontoquoted
set the time limit to execute these APIs), so that Application doesn’t have to wait so long when installing and deleting flowswithquoted
slow hardware/NIC. What do you think? Most of the datapath flowinstallations arequoted
dynamic and triggered only when there is an ingress traffic. Delay in flow insertion/deletion have unpredictableconsequences. This API is (currently) aimed at the control path only, and must indeed be assumed to be slow. Creating million of rules may take quite long as it may involve syscalls and other time-consuming synchronization things on the PMD side. So currently there is no plan to have rules added from the data path with time constraints. I think it would be implemented through a different set of functions anyway. I do not think adding time limits is practical, even specifying in the API that creating a single flow rule must take less than a maximum number of seconds in order to be effective is too much of a constraint (applications that create all flows during init may not care afterall).quoted
quoted
You should consider in any case that modifying flow rules will always be slower than receiving packets, there is no way around that. Applications have to live with it and provide a software fallback for incoming packets while managing flow rules. Moreover, think about what happens when you hit the maximumnumberquoted
quoted
of flow rules and cannot create any more. Applications need to implement some kind of fallback in their data path. Offloading flows in HW is also only useful if they live much longer than the time taken to create and delete them. Perhaps applications may choose to do so after detecting long lived flows such as TCP sessions. You may have one separate control thread dedicated to manage flows and keep your normal control thread unaffected by delays. Several threads can even be dedicated, one per device.[Sugesh] I agree that the flow insertion cannot be as fast as the packet receiving rate. From application point of view the problem will be when hardware flow insertion takes longer than software flow insertion. At least application has to know the cost of inserting/deleting a rule in hardware beforehand. Otherwise how application can choose the right flow candidate for hardware. My pointhere is application is expecting a deterministic behavior from a classifier while inserting and deleting rules. Understood, however it will be difficult to estimate, particularly if a PMD must rearrange flow rules to make room for a new one due to priority levels collision or some other HW-related reason. I mean, spent time cannot be assumed to be constant, even PMDs cannot know in advance because it also depends on the performance of the host CPU. Such applications may find it easier to measure elapsed time for the rules they create, make statistics and extrapolate from this information for future rules. I do not think the PMD can help much here.[Sugesh] From an application point of view this can be an issue. Even there is a security concern when we program a short lived flow. Lets consider the case, 1) Control plane programs the hardware with Queue termination flow. 2) Software dataplane programmed to treat the packets from the specificqueue accordingly.quoted
3) Remove the flow from the hardware. (Lets consider this is a long waitprocess..).quoted
Or even there is a chance that hardware take more time to report the status than removing it physically . Now the packets in the queue no longerconsider as matched/flow hit.quoted
. This is due to the software dataplane update is yet to happen. We must need a way to sync between software datapath and classifier APIs even though they are both programmed from a different controlthread.quoted
Are we saying these APIs are only meant for user defined static flows??No, that is definitely not the intent. These are good points. With the specified API, applications may have to adapt their logic and take extra precautions in order to remain on the safe side at all times. For your above example, the application cannot assume a rule is added/deleted as long as the PMD has not completed the related operation, which means keeping the SW rule/fallback in place in the meantime. Should handle security concerns as long as after removing a rule, packets end up in a default queue entirely processed by SW. Obviously this may worsen response time. The ID action can help with this. By knowing which rule a received packet is associated with, processing can be temporarily offloaded by another thread without much complexity.
[Sugesh] Setting ID for every flow may not viable especially when the size of ID is small(just only 8 bits). I am not sure is this a valid case though. How about a hardware flow flag in packet descriptor that set when the packets hits any hardware rule. This way software doesn’t worry /blocked by a hardware rule . Even though there is an additional overhead of validating this flag, software datapath can identify the hardware processed packets easily. This way the packets traverses the software fallback path until the rule configuration is complete. This flag avoids setting ID action for every hardware flow that are configuring.
I think applications have to implement SW fallbacks all the time, as even some sort of guarantee on the flow rule processing time may not be enough to avoid misdirected packets and related security issues.
[Sugesh] Software fallback will be there always. However I am little bit confused on the way software going to identify the packets that are already hardware processed . I feel we need some notification in the packet itself, when a hardware rule hits. ID/flag/any other options?
Let's wait for applications to start using this API and then consider an extra set of asynchronous / real-time functions when the need arises. It should not impact the way rules are specified
[Sugesh] Sure. I think the rule definition may not impact with this. .
quoted
quoted
quoted
quoted
quoted
[Sugesh] Another query is on the synchronization part. What if same rulesarequoted
handled from different threads? Is application responsible for handling theconcurrentquoted
hardware programming?Like most (if not all) DPDK APIs, applications are responsible for managing locking issues as decribed in 4.3 (Behavior). Since this is a control path API and applications usually have a single control thread, locking should not be necessary in most cases. Regarding my above comment about using several control threads to manage different devices, section 4.3 says: "There is no provision for reentrancy/multi-thread safety, although nothing should prevent different devices from being configured at the same time. PMDs may protect their control path functionsaccordingly."quoted
quoted
I'd like to emphasize it is not "per port" but "per device", since in a few cases a configurable resource is shared by several ports. It may be difficult for applications to determine which ports are shared by a given device but this falls outside the scope of thisAPI.quoted
quoted
quoted
quoted
Do you think adding the guarantee that it is always safe to configure two different ports simultaneously without locking from the application side is necessary? In which case the PMD would be responsible for locking shared resources.[Sugesh] This would be little bit complicated when some of ports are not under DPDK itself(what if one port is managed by Kernel) Or ports are tied by different application. Locking in PMD helps when the ports are accessed by multiple DPDK application. However what if the port itselfnot under DPDK? Well, either we do not care about what happens outside of the DPDK context, or PMDs must find a way to satisfy everyone. I'm not a fan of locking either but it would be nice if flow rules configuration could be attempted on different ports simultaneously without the risk of wrecking anything, so that applications do not need to care. Possible cases for a dual port device with global flow rule settings affecting both ports: 1) ports 1 & 2 are managed by DPDK: this is the easy case, a rule thatneedsquoted
quoted
to alter a global setting necessary for an existing rule on any port is not allowed (EEXIST). PMD must maintain a device context common tobothquoted
quoted
ports in order for this to work. This context is either under lock, or the first port on which a flow rule is created owns all future flow rules. 2) port 1 is managed by DPDK, port 2 by something else, the PMD isaware ofquoted
quoted
it and knows that port 2 may modify the global context: no flow rulescanquoted
quoted
be created from the DPDK application due to safety issues (EBUSY?). 3) port 1 is managed by DPDK, port 2 by something else, the PMD isaware ofquoted
quoted
it and knows that port 2 will not modify flow rules: PMD should not care, no lock necessary. 4) port 1 is managed by DPDK, port 2 by something else and the PMD isnotquoted
quoted
aware of it: either flow rules cannot be created ever at all, or we say it is user's reponsibility to make sure this does not happen. Considering that most control operations performed by DPDK affect the device regardless of other applications, I think 1) is the only case that should be defined, otherwise 4), defined as user'sresponsibility. No more comments on this part? What do you suggest?
[Sugesh] I agree with your suggestions. I feel this is the best that can offer.
-- Adrien Mazarguil 6WIND