Re: [RFC v2] doc compression API for DPDK
From: Verma, Shally <hidden>
Date: 2018-02-20 09:58:19
-----Original Message----- From: Ahmed Mansour [mailto:ahmed.mansour@nxp.com] Sent: 17 February 2018 02:52 To: Trahe, Fiona <redacted>; Verma, Shally <redacted>; dev@dpdk.org Cc: Athreya, Narayana Prasad <redacted>; Gupta, Ashish <redacted>; Sahu, Sunila [off-list ref]; De Lara Guarch, Pablo [off-list ref]; Challa, Mahipal [off-list ref]; Jain, Deepak K [off-list ref]; Hemant Agrawal [off-list ref]; Roy Pledge [off-list ref]; Youri Querry [off-list ref] Subject: Re: [RFC v2] doc compression API for DPDKquoted
quoted
-----Original Message----- From: Verma, Shally [mailto:Shally.Verma@cavium.com] Sent: Friday, February 16, 2018 7:17 AM To: Ahmed Mansour <redacted>; Trahe, Fiona <redacted>; dev@dpdk.org Cc: Athreya, Narayana Prasad <redacted>; Gupta, Ashish [off-list ref]; Sahu, Sunila [off-list ref]; De Lara Guarch, Pablo [off-list ref]; Challa, Mahipal [off-list ref]; Jain, Deepak K [off-list ref]; Hemant Agrawal [off-list ref]; Roy Pledge [off-list ref]; Youri Querry [off-list ref] Subject: RE: [RFC v2] doc compression API for DPDK Hi Fiona, Ahmedquoted
-----Original Message----- From: Ahmed Mansour [mailto:ahmed.mansour@nxp.com] Sent: 16 February 2018 02:40 To: Trahe, Fiona <redacted>; Verma, Shally <redacted>; dev@dpdk.org Cc: Athreya, Narayana Prasad <redacted>; Gupta, Ashish[off-list ref]; Sahu, Sunilaquoted
[off-list ref]; De Lara Guarch, Pablo [off-list ref]; Challa,Mahipalquoted
[off-list ref]; Jain, Deepak K [off-list ref]; Hemant Agrawal[off-list ref]; Royquoted
Pledge [off-list ref]; Youri Querry [off-list ref] Subject: Re: [RFC v2] doc compression API for DPDK On 2/15/2018 1:47 PM, Trahe, Fiona wrote:quoted
Hi Shally, Ahmed, Sorry for the delay in replying, Comments belowquoted
-----Original Message----- From: Verma, Shally [mailto:Shally.Verma@cavium.com] Sent: Wednesday, February 14, 2018 7:41 AM To: Ahmed Mansour <redacted>; Trahe, Fiona <redacted>; dev@dpdk.org Cc: Athreya, Narayana Prasad <redacted>; Gupta, Ashish [off-list ref]; Sahu, Sunila [off-list ref]; De Lara Guarch, Pablo [off-list ref]; Challa, Mahipal [off-list ref]; Jain, Deepak K [off-list ref]; Hemant Agrawal [off-list ref]; Roy Pledge [off-list ref]; Youri Querry [off-list ref] Subject: RE: [RFC v2] doc compression API for DPDK Hi Ahmed,quoted
-----Original Message----- From: Ahmed Mansour [mailto:ahmed.mansour@nxp.com] Sent: 02 February 2018 01:53 To: Trahe, Fiona <redacted>; Verma, Shally <redacted>;dev@dpdk.orgquoted
quoted
quoted
quoted
Cc: Athreya, Narayana Prasad <redacted>; Gupta, Ashish[off-list ref]; Sahu, Sunilaquoted
[off-list ref]; De Lara Guarch, Pablo [off-list ref]; Challa,Mahipalquoted
[off-list ref]; Jain, Deepak K [off-list ref]; Hemant Agrawal[off-list ref]; Royquoted
Pledge [off-list ref]; Youri Querry [off-list ref] Subject: Re: [RFC v2] doc compression API for DPDK On 1/31/2018 2:03 PM, Trahe, Fiona wrote:quoted
Hi Ahmed, Shally, ///snip///quoted
quoted
quoted
quoted
quoted
quoted
D.1.1 Stateless and OUT_OF_SPACE ------------------------------------------------ OUT_OF_SPACE is a condition when output buffer runs out of spaceandquoted
quoted
where PMD still has more data to produce. If PMD run into suchcondition,quoted
quoted
then it's an error condition in stateless processing.quoted
In such case, PMD resets itself and return with statusRTE_COMP_OP_STATUS_OUT_OF_SPACE with produced=consumed=0i.e.quoted
quoted
no input read, no output written.quoted
Application can resubmit an full input with larger output buffer size.[Ahmed] Can we add an option to allow the user to read the data thatwasquoted
quoted
produced while still reporting OUT_OF_SPACE? this is mainly useful for decompression applications doing search.[Shally] It is there but applicable for stateful operation type (please refer tohandling out_of_space underquoted
"Stateful Section"). By definition, "stateless" here means that application (such as IPCOMP)knows maximum output sizequoted
guaranteedly and ensure that uncompressed data size cannot grow morethan provided output buffer.quoted
Such apps can submit an op with type = STATELESS and provide full input,then PMD assume it hasquoted
sufficient input and output and thus doesn't need to maintain any contextsafter op is processed.quoted
If application doesn't know about max output size, then it should process itas stateful op i.e. setup opquoted
with type = STATEFUL and attach a stream so that PMD can maintainrelevant context to handle suchquoted
condition.[Fiona] There may be an alternative that's useful for Ahmed, while still respecting the stateless concept. In Stateless case where a PMD reports OUT_OF_SPACE in decompression case it could also return consumed=0, produced = x, where x>0. X indicates the amount of valid data which has been written to the output buffer. It is not complete, but if an application wants to search it it may be sufficient. If the application still wants the data it must resubmit the whole input with a bigger output buffer, and decompression will be repeated from the start, it cannot expect to continue on as the PMD has not maintained state, history or data. I don't think there would be any need to indicate this in capabilities, PMDs which cannot provide this functionality would always return produced=consumed=0, while PMDs which can could set produced > 0. If this works for you both, we could consider a similar case for compression.[Shally] Sounds Fine to me. Though then in that case, consume should also be updated toactualquoted
quoted
quoted
quoted
quoted
quoted
consumed by PMD.quoted
Setting consumed = 0 with produced > 0 doesn't correlate.[Ahmed]I like Fiona's suggestion, but I also do not like the implication of returning consumed = 0. At the same time returning consumed = y implies to the user that it can proceed from the middle. I prefer the consumed = 0 implementation, but I think a different return is needed to distinguish it from OUT_OF_SPACE that the use can recover from. Perhaps OUT_OF_SPACE_RECOVERABLE and OUT_OF_SPACE_TERMINATED. This also allows future PMD implementations to provide recover-ability even in STATELESS mode if they so wish. In this model STATELESS or STATEFUL would be a hint for the PMD implementation to make optimizations for each case, but it does not force the PMD implementation to limit functionality if it can provide recover-ability.[Fiona] So you're suggesting the following: OUT_OF_SPACE - returned only on stateful operation. Not an error. Op.produced can be used and next op in stream should continue on from op.consumed+1. OUT_OF_SPACE_TERMINATED - returned only on stateless operation. Error condition, no recovery possible. consumed=produced=0. Application must resubmit all input data with a bigger output buffer. OUT_OF_SPACE_RECOVERABLE - returned only on stateless operation, some recovery possible. - consumed = 0, produced > 0. Application must resubmit all input data with a bigger output buffer. However in decompression case, data up to produced in dst buffer may be inspected/searched. Never happens in compression case as output data would be meaningless. - consumed > 0, produced > 0. PMD has stored relevant state and history and so can convert to stateful, using op.produced and continuing from consumed+1. I don't expect our PMDs to use this last case, but maybe this works for others? I'm not convinced it's not just adding complexity. It sounds like a version of stateful without a stream, and maybe less efficient? If so should it respect the FLUSH flag? Which would have been FULL or FINAL in the op. Or treat it as FLUSH_NONE or SYNC? I don't know why an application would not simply have submitted a STATEFUL request if this is the behaviour it wants?[Ahmed] I was actually suggesting the removal of OUT_OF_SPACE entirely and replacing it with OUT_OF_SPACE_TERMINATED - returned only on stateless operation. Error condition, no recovery possible. - consumed=0 produced=amount of data produced. Application must resubmit all input data with a bigger output buffer to process all of the op OUT_OF_SPACE_RECOVERABLE - Normally returned on stateful operation. Not an error. Op.produced can be used and next op in stream should continue on from op.consumed+1. - consumed > 0, produced > 0. PMD has stored relevant state and history and so can continue using op.produced and continuing from consumed+1. We would not return OUT_OF_SPACE_RECOVERABLE in stateless mode in our implementation either. Regardless of speculative future PMDs. The more important aspect of this for today is that the return status clearly determines the meaning of "consumed". If it is RECOVERABLE then consumed is meaningful. if it is TERMINATED then consumed in meaningless. This way we take away the ambiguity of having OUT_OF_SPACE mean two different user work flows. A speculative future PMD may be designed to return RECOVERABLE for stateless ops that are attached to streams. A future PMD may look to see if an op has a stream is attached and write out the state there and go into recoverable mode. in essence this leaves the choice up to the implementation and allows the PMD to take advantage of stateless optimizations so long as a "RECOVERABLE" scenario is rarely hit. The PMD will dump context as soon as it fully processes an op. It will only write context out in cases where the op chokes. This futuristic PMD should ignore the FLUSH since this STATELESS mode as indicated by the user and optimize[Shally] IMO, it looks okay to have two separate return code TERMINATED and RECOVERABLE with definition as you mentioned and seem doable. So then it mean all following conditions: a. stateless with flush = full/final, no stream pointer provided , PMD can return TERMINATED i.e.userquoted
quoted
quoted
has to start all over again, it's a failure (as in current definition) b. stateless with flush = full/final, stream pointer provided, here it's up to PMD to return either TERMINATED or RECOVERABLE depending upon its ability (note if Recoverable, then PMD willmaintainquoted
quoted
quoted
states in stream pointer) c. stateful with flush = full / NO_SYNC, stream pointer always there, PMD will TERMINATED/RECOVERABLE depending on STATEFUL_COMPRESSION/DECOMPRESSION featureflagquoted
quoted
quoted
enabled or not[Fiona] I don't think the flush flag is relevant - it could be out of space on any flush flag, and if out ofspacequoted
quoted
should ignore the flush flag. Is there a need for TERMINATED? - I didn't think it would ever need to be returned in stateful case. Why the ref to feature flag? If a PMD doesn't support a feature I think it should fail the op - not with out-of space, but unsupported or similar. Or it would fail on stream creation.[Ahmed] Agreed with Fiona. The flush flag only matters on success. By definition the PMD should return OUT_OF_SPACE_RECOVERABLE in stateful mode when it runs out of space. @Shally If the user did not provide a stream, then the PMD should probably return TERMINATED every time. I am not sure we should make a "really smart" PMD which returns RECOVERABLE even if no stream pointer was given. In that case the PMD must give some ID back to the caller that the caller can use to "recover" the op. I am not sure how it would be implemented in the PMD and when does the PMD decide to retire streams belonging to dead ops that the caller decided not to "recover".quoted
quoted
and one more exception case is: d. stateless with flush = full, no stream pointer provided, PMD can return RECOVERABLE i.e. PMD internally maintained that state somehow and consumed & produced > 0, so user can startconsumed+1quoted
quoted
quoted
but there's restriction on user not to alter or change op until it is fully processed?![Fiona] Why the need for this case? There's always a restriction on user not to alter or change op until it is fully processed. If a PMD can do this - why doesn't it create a stream when that API is called - and then it's same as b?[Ahmed] Agreed. The user should not touch an op once enqueued until they receive it in dequeue. We ignore the flush in stateless mode. We assume it to be final every time.[Shally] Agreed and am not in favour of supporting such implementation either. Just listed out different possibilities up here to better visualise Ahmed requirements/applicability of TERMINATED and RECOVERABLE.quoted
quoted
quoted
API currently takes care of case a and c, and case b can be supported if specification accept another proposal which mention optional usage of stream with stateless.[Fiona] API has this, but as we agreed, not optional to call the create_stream() with an op_type parameter (stateful/stateless). PMD can return NULL or provide a stream, if the latter then that stream must be attached to ops. Until then API takes no difference toquoted
case b and c i.e. we can have op such as, - type= stateful with flush = full/final, stream pointer provided, PMD can return TERMINATED/RECOVERABLE according to its ability Case d , is something exceptional, if there's requirement in PMDs to support it, then believe it will be doable with concept of different return code.[Fiona] That's not quite how I understood it. Can it be simpler and only following cases? a. stateless with flush = full/final, no stream pointer provided , PMD can return TERMINATED i.e. user has to start all over again, it's a failure (as in current definition). consumed = 0, produced=amount of data produced. This is usually 0, but in decompression case a PMD may return > 0 and application may find it useful to inspect that data. b. stateless with flush = full/final, stream pointer provided, here it's up to PMD to return either TERMINATED or RECOVERABLE depending upon its ability (note if Recoverable, then PMD willmaintainquoted
quoted
states in stream pointer) c. stateful with flush = any, stream pointer always there, PMD will return RECOVERABLE. op.produced can be used and next op in stream should continue on from op.consumed+1. Consumed=0, produced=0 is an unusual but allowed case. I'm not sure if it could ever happen, but no need to change state to TERMINATED in this case. There may be useful state/history stored in the PMD, even though no output produced yet.[Ahmed] Agreed[Shally] Sounds good.quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
D.2 Compression API Stateful operation ---------------------------------------------------------- A Stateful operation in DPDK compression means application invokesenqueue burst() multiple times to process related chunk of data either becausequoted
- Application broke data into several ops, and/or - PMD ran into out_of_space situation during input processing In case of either one or all of the above conditions, PMD is required tomaintain state of op across enque_burst() calls andquoted
ops are setup with op_type RTE_COMP_OP_STATEFUL, and begin withflush value = RTE_COMP_NO/SYNC_FLUSH and end at flush value RTE_COMP_FULL/FINAL_FLUSH.quoted
D.2.1 Stateful operation state maintenance --------------------------------------------------------------- It is always an ideal expectation from application that it should parsethrough all related chunk of source data making its mbuf-chain andenqueuequoted
quoted
it for stateless processing.quoted
However, if it need to break it into several enqueue_burst() calls, thenanquoted
quoted
expected call flow would be something like:quoted
enqueue_burst( |op.no_flush |)[Ahmed] The work is now in flight to the PMD.The user will call dequeue burst in a loop until all ops are received. Is this correct?quoted
deque_burst(op) // should dequeue before we enqueue next[Shally] Yes. Ideally every submitted op need to be dequeued. Howeverthis illustration is specifically inquoted
context of stateful op processing to reflect if a stream is broken intochunks, then each chunk should bequoted
submitted as one op at-a-time with type = STATEFUL and need to bedequeued first before next chunk isquoted
enqueued.quoted
quoted
enqueue_burst( |op.no_flush |) deque_burst(op) // should dequeue before we enqueue next enqueue_burst( |op.full_flush |)[Ahmed] Why now allow multiple work items in flight? I understand that occasionaly there will be OUT_OF_SPACE exception. Can we justdistinguishquoted
quoted
the response in exception cases?[Shally] Multiples ops are allowed in flight, however condition is each op insuch case is independent ofquoted
each other i.e. belong to different streams altogether. Earlier (as part of RFC v1 doc) we did consider the proposal to process allrelated chunks of data in singlequoted
burst by passing them as ops array but later found that as not-so-useful forPMD handling for variousquoted
reasons. You may please refer to RFC v1 doc review comments for same.[Fiona] Agree with Shally. In summary, as only one op can be processed at a time, since each needs the state of the previous, to allow more than 1 op to be in-flight at a time would force PMDs to implement internal queueing and exception handling for OUT_OF_SPACE conditions you mention.[Ahmed] But we are putting the ops on qps which would make them sequential. Handling OUT_OF_SPACE conditions would be a little bit more complex but doable.[Fiona] In my opinion this is not doable, could be very inefficient. There may be many streams. The PMD would have to have an internal queue per stream so it could adjust the next src offset and length in the OUT_OF_SPACE case. And this may ripple back though all subsequent ops in the stream as each source len is increased and its dst buffer is not big enough.[Ahmed] Regarding multi op OUT_OF_SPACE handling. The caller would still need to adjust the src length/output buffer as you say. The PMD cannot handle OUT_OF_SPACE internally. After OUT_OF_SPACE occurs, the PMD should reject all ops in this stream until it gets explicit confirmation from the caller to continue working on this stream. Any ops received by the PMD should be returned to the caller with status STREAM_PAUSED since the caller did not explicitly acknowledge that it has solved the OUT_OF_SPACE issue. These semantics can be enabled by adding a new function to the API perhaps stream_resume(). This allows the caller to indicate that it acknowledges that it has seen the issue and this op should be used to resolve the issue. Implementations that do not support this mode of use can push back immediately after one op is in flight. Implementations that support this use mode can allow many ops from the same session[Shally] Is it still in context of having single burst where all op belongs to one stream? If yes, I wouldstillquoted
quoted
quoted
say it would add an overhead to PMDs especially if it is expected to work closer to HW (which I thinkisquoted
quoted
quoted
the case with DPDK PMD). Though your approach is doable but why this all cannot be in a layer above PMD? i.e. a layer abovePMDquoted
quoted
quoted
can either pass one-op at a time with burst size = 1 OR can make chained mbuf of input and outputandquoted
quoted
quoted
pass than as one op. Is it just to ease applications of chained mbuf burden or do you see any performance /use-case impacting aspect also? if it is in context where each op belong to different stream in a burst, then why do we need stream_pause and resume? It is a expectations from app to pass more output buffer with consumed+ 1quoted
quoted
quoted
from next call onwards as it has already seen OUT_OF_SPACE.[Ahmed] Yes, this would add extra overhead to the PMD. Our PMD implementation rejects all ops that belong to a stream that has entered "RECOVERABLE" state for one reason or another. The caller must acknowledge explicitly that it has received news of the problem before the PMD allows this stream to exit "RECOVERABLE" state. I agree with you that implementing this functionality in the software layer above the PMD is a bad idea since the latency reductions are lost.[Shally] Just reiterating, I rather meant other way around i.e. I see it easier to put all such complexity in a layer above PMD.quoted
This setup is useful in latency sensitive applications where the latency of buffering multiple ops into one op is significant. We found latency makes a significant difference in search applications where the PMD competes with software decompression.[Fiona] I see, so when all goes well, you get best-case latency, but when out-of-space occurs latency will probably be worse.[Ahmed] This is exactly right. This use mode assumes out-of-space is a rare occurrence. Recovering from it should take similar time to synchronous implementations. The caller gets OUT_OF_SPACE_RECOVERABLE in both sync and async use. The caller can fix up the op and send it back to the PMD to continue work just as would be done in sync. Nonetheless, the added complexity is not justifiable if out-of-space is very common since the recoverable state will be the limiting factor that forces synchronicity.quoted
quoted
quoted
quoted
[Fiona] I still have concerns with this and would not want to support in our PMD. TO make sure I understand, you want to send a burst of ops, with several from same stream. If one causes OUT_OF_SPACE_RECOVERABLE, then the PMD should not process any subsequent ops in that stream. Should it return them in a dequeue_burst() with status still NOT_PROCESSED? Or somehow drop them? How? While still processing ops form other streams.[Ahmed] This is exactly correct. It should return them with NOT_PROCESSED. Yes, the PMD should continue processing other streams.quoted
As we want to offload each op to hardware with as little CPU processing as possible we would not want to open up each op to see which stream it's attached to and make decisions to do per-stream storage, or drop it, or bypass hw and dequeue without processing.[Ahmed] I think I might have missed your point here, but I will try to answer. There is no need to "cushion" ops in DPDK. DPDK should send ops to the PMD and the PMD should reject until stream_continue() is called. The next op to be sent by the user will have a special marker in it to inform the PMD to continue working on this stream. Alternatively the DPDK layer can be made "smarter" to fail during the enqueue by checking the stream and its state, but like you say this adds additional CPU overhead during the enqueue. I am curious. In a simple synchronous use case. How do we prevent users from putting multiple ops in flight that belong to a single stream? Do we just currently say it is undefined behavior? Otherwise we would have to check the stream and incur the CPU overhead.[Fiona] We don't do anything to prevent it. It's undefined. IMO on data path in DPDK model we expect good behaviour and don't have to error check for things like this.[Ahmed] This makes sense. We also assume good behavior.quoted
In our PMD if we got a burst of 20 ops, we allocate 20 spaces on the hw q, then build and send those messages. If we found an op from a stream which already had one inflight, we'd have to hold that back, store in a sw stream-specific holding queue, only send 19 to hw. We cannot send multiple ops from same stream to the hw as it fans them out and does them in parallel. Once the enqueue_burst() returns, there is no processing context which would spot that the first has completed and send the next op to the hw. On a dequeue_burst() we would spot this, in that context could process the next op in the stream. On out of space, instead of processing the next op we would have to transfer all unprocessed ops from the stream to the dequeue result. Some parts of this are doable, but seems likely to add a lot more latency, we'd need to add extra threads and timers to move ops from the sw queue to the hw q to get any benefit, and these constructs would add context switching and CPU cycles. So we prefer to push this responsibility to above the API and it can achieve similar.[Ahmed] I see what you mean. Our workflow is almost exactly the same with our hardware, but the fanning out is done by the hardware based on the stream and ops that belong to the same stream are never allowed to go out of order. Otherwise the data would be corrupted. Likewise the hardware is responsible for checking the state of the stream and returning frames as NOT_PROCESSED to the softwarequoted
quoted
quoted
quoted
Maybe we could add a capability if this behaviour is important for you? e.g. ALLOW_ENQUEUE_MULTIPLE_STATEFUL_OPS ? Our PMD would set this to 0. And expect no more than one op from a stateful stream to be in flight at any time.[Ahmed] That makes sense. This way the different DPDK implementations do not have to add extra checking for unsupported cases.[Shally] @ahmed, If I summarise your use-case, this is how to want to PMD to support? - a burst *carry only one stream* and all ops then assumed to be belong to that stream? (please note, here burst is not carrying more than one stream)[Ahmed] No. In this use case the caller sets up an op and enqueues a single op. Then before the response comes back from the PMD the caller enqueues a second op on the same stream.quoted
quoted
-PMD will submit one op at a time to HW?[Ahmed] I misunderstood what PMD means. I used it throughout to mean the HW. I used DPDK to mean the software implementation that talks to the hardware. The software will submit all ops immediately. The hardware has to figure out what to do with the ops depending on what stream they belong to.quoted
quoted
-if processed successfully, push it back to completion queue with status = SUCCESS. If failed or run to into OUT_OF_SPACE, then push it to completion queue with status = FAILURE/ OUT_OF_SPACE_RECOVERABLE and rest with status = NOT_PROCESSED and return with enqueue count = total # of ops submitted originally with burst?[Ahmed] This is exactly what I had in mind. all ops will be submitted to the HW. The HW will put all of them on the completion queue with the correct status exactly as you say.quoted
quoted
-app assumes all have been enqueued, so it go and dequeue all ops -on seeing an op with OUT_OF_SPACE_RECOVERABLE, app resubmit a burst of ops with call to stream_continue/resume API starting from op which encountered OUT_OF_SPACE and others as NOT_PROCESSED with updated input and output buffer?[Ahmed] Correct this is what we do today in our proprietary API.quoted
quoted
-repeat until *all* are dequeued with status = SUCCESS or *any* with status = FAILURE? If anytime failure is seen, then app start whole processing all over again or just drop this burst?![Ahmed] The app has the choice on how to proceed. If the issue is recoverable then the application can continue this stream from where it stopped. if the failure is unrecoverable then the application should first fix the problem and start from the beginning of the stream.quoted
quoted
If all of above is true, then I think we should add another API such as rte_comp_enque_single_stream() which will be functional under Feature Flag = ALLOW_ENQUEUE_MULTIPLE_STATEFUL_OPS or better name is SUPPORT_ENQUEUE_SINGLE_STREAM?![Ahmed] The main advantage in async use is lost if we force all related ops to be in the same burst. if we do that, then we might as well merge all the ops into one op. That would reduce the overhead. The use mode I am proposing is only useful in cases where the data becomes available after the first enqueue occurred. I want to allow the caller to enqueue the second set of data as soon as it is available regardless of whether or not the HW has already started working on the first op inflight.
[Shally] @ahmed, Ok.. seems I missed a point here. So, confirm me following: As per current description in doc, expected stateful usage is: enqueue (op1) --> dequeue(op1) --> enqueue(op2) but you're suggesting to allow an option to change it to enqueue(op1) -->enqueue(op2) i.e. multiple ops from same stream can be put in-flight via subsequent enqueue_burst() calls without waiting to dequeue previous ones as PMD support it . So, no change to current definition of a burst. It will still carry multiple streams where each op belonging to different stream ?! if yes, then seems your HW can be setup for multiple streams so it is efficient for your case to support it in DPDK PMD layer but our hw doesn't by-default and need SW to back it. Given that, I also suggest to enable it under some feature flag. However it looks like an add-on and if it doesn't change current definition of a burst and minimum expectation set on stateful processing described in this document, then IMO, you can propose this feature as an incremental patch on baseline version, in absence of which, application will exercise stateful processing as described here (enq->deq->enq). Thoughts?
quoted
[Fiona] Am curious about Ahmed's response to this. I didn't get that a burst should carry only one stream Or get how this makes a difference? As there can be many enqueue_burst() calls done before an dequeue_burst() Maybe you're thinking the enqueue_burst() would be a blocking call that would not return until all the ops had been processed? This would turn it into a synchronous call which isn't the intent.[Ahmed] Agreed, a blocking or even a buffering software layer that baby sits the hardware does not fundamentally change the parameters of the system as a whole. It just moves workflow management complexity down into the DPDK software layer. Rather there are real latency and throughput advantages (because of caching) that I want to expose.
/// snip ///