Re: [RFC v2] doc compression API for DPDK

From: Verma, Shally <hidden>
Date: 2018-02-20 09:58:19

-----Original Message-----
From: Ahmed Mansour [mailto:ahmed.mansour@nxp.com]
Sent: 17 February 2018 02:52
To: Trahe, Fiona <redacted>; Verma, Shally <redacted>; dev@dpdk.org
Cc: Athreya, Narayana Prasad <redacted>; Gupta, Ashish <redacted>; Sahu, Sunila
[off-list ref]; De Lara Guarch, Pablo [off-list ref]; Challa, Mahipal
[off-list ref]; Jain, Deepak K [off-list ref]; Hemant Agrawal [off-list ref]; Roy
Pledge [off-list ref]; Youri Querry [off-list ref]
Subject: Re: [RFC v2] doc compression API for DPDK

quoted

-----Original Message-----
From: Verma, Shally [mailto:Shally.Verma@cavium.com]
Sent: Friday, February 16, 2018 7:17 AM
To: Ahmed Mansour <redacted>; Trahe, Fiona <redacted>;
dev@dpdk.org
Cc: Athreya, Narayana Prasad <redacted>; Gupta, Ashish
[off-list ref]; Sahu, Sunila [off-list ref]; De Lara Guarch, Pablo
[off-list ref]; Challa, Mahipal [off-list ref]; Jain, Deepak K
[off-list ref]; Hemant Agrawal [off-list ref]; Roy Pledge
[off-list ref]; Youri Querry [off-list ref]
Subject: RE: [RFC v2] doc compression API for DPDK

Hi Fiona, Ahmed

quoted

-----Original Message-----
From: Ahmed Mansour [mailto:ahmed.mansour@nxp.com]
Sent: 16 February 2018 02:40
To: Trahe, Fiona <redacted>; Verma, Shally <redacted>; dev@dpdk.org
Cc: Athreya, Narayana Prasad <redacted>; Gupta, Ashish

[off-list ref]; Sahu, Sunila

quoted

[off-list ref]; De Lara Guarch, Pablo [off-list ref]; Challa,

Mahipal

quoted

[off-list ref]; Jain, Deepak K [off-list ref]; Hemant Agrawal

[off-list ref]; Roy

quoted

Pledge [off-list ref]; Youri Querry [off-list ref]
Subject: Re: [RFC v2] doc compression API for DPDK

On 2/15/2018 1:47 PM, Trahe, Fiona wrote:

quoted

Hi Shally, Ahmed,
Sorry for the delay in replying,
Comments below

quoted

-----Original Message-----
From: Verma, Shally [mailto:Shally.Verma@cavium.com]
Sent: Wednesday, February 14, 2018 7:41 AM
To: Ahmed Mansour <redacted>; Trahe, Fiona <redacted>;
dev@dpdk.org
Cc: Athreya, Narayana Prasad <redacted>; Gupta, Ashish
[off-list ref]; Sahu, Sunila [off-list ref]; De Lara Guarch, Pablo
[off-list ref]; Challa, Mahipal [off-list ref]; Jain, Deepak K
[off-list ref]; Hemant Agrawal [off-list ref]; Roy Pledge
[off-list ref]; Youri Querry [off-list ref]
Subject: RE: [RFC v2] doc compression API for DPDK

Hi Ahmed,

quoted

-----Original Message-----
From: Ahmed Mansour [mailto:ahmed.mansour@nxp.com]
Sent: 02 February 2018 01:53
To: Trahe, Fiona <redacted>; Verma, Shally <redacted>;

dev@dpdk.org

quoted

Cc: Athreya, Narayana Prasad <redacted>; Gupta, Ashish

[off-list ref]; Sahu, Sunila

quoted

[off-list ref]; De Lara Guarch, Pablo [off-list ref]; Challa,

Mahipal

quoted

[off-list ref]; Jain, Deepak K [off-list ref]; Hemant Agrawal

[off-list ref]; Roy

quoted

Pledge [off-list ref]; Youri Querry [off-list ref]
Subject: Re: [RFC v2] doc compression API for DPDK

On 1/31/2018 2:03 PM, Trahe, Fiona wrote:

quoted

Hi Ahmed, Shally,

///snip///

quoted

D.1.1 Stateless and OUT_OF_SPACE
------------------------------------------------
OUT_OF_SPACE is a condition when output buffer runs out of space

and

quoted

where PMD still has more data to produce. If PMD run into such

condition,

quoted

then it's an error condition in stateless processing.

quoted

In such case, PMD resets itself and return with status

RTE_COMP_OP_STATUS_OUT_OF_SPACE with produced=consumed=0

i.e.

quoted

no input read, no output written.

quoted

Application can resubmit an full input with larger output buffer size.

[Ahmed] Can we add an option to allow the user to read the data that

was

quoted

produced while still reporting OUT_OF_SPACE? this is mainly useful for
decompression applications doing search.

[Shally] It is there but applicable for stateful operation type (please refer to

handling out_of_space under

quoted

"Stateful Section").
By definition, "stateless" here means that application (such as IPCOMP)

knows maximum output size

quoted

guaranteedly and ensure that uncompressed data size cannot grow more

than provided output buffer.

quoted

Such apps can submit an op with type = STATELESS and provide full input,

then PMD assume it has

quoted

sufficient input and output and thus doesn't need to maintain any contexts

after op is processed.

quoted

If application doesn't know about max output size, then it should process it

as stateful op i.e. setup op

quoted

with type = STATEFUL and attach a stream so that PMD can maintain

relevant context to handle such

quoted

condition.

[Fiona] There may be an alternative that's useful for Ahmed, while still
respecting the stateless concept.
In Stateless case where a PMD reports OUT_OF_SPACE in decompression
case
it could also return consumed=0, produced = x, where x>0. X indicates the
amount of valid data which has
 been written to the output buffer. It is not complete, but if an application
wants to search it it may be sufficient.
If the application still wants the data it must resubmit the whole input with a
bigger output buffer, and
 decompression will be repeated from the start, it
 cannot expect to continue on as the PMD has not maintained state, history
or data.
I don't think there would be any need to indicate this in capabilities, PMDs
which cannot provide this
functionality would always return produced=consumed=0, while PMDs which
can could set produced > 0.
If this works for you both, we could consider a similar case for compression.

[Shally] Sounds Fine to me. Though then in that case, consume should also be updated to

actual

quoted

consumed by PMD.

quoted

Setting consumed = 0 with produced > 0 doesn't correlate.

[Ahmed]I like Fiona's suggestion, but I also do not like the implication
of returning consumed = 0. At the same time returning consumed = y
implies to the user that it can proceed from the middle. I prefer the
consumed = 0 implementation, but I think a different return is needed to
distinguish it from OUT_OF_SPACE that the use can recover from. Perhaps
OUT_OF_SPACE_RECOVERABLE and OUT_OF_SPACE_TERMINATED. This also allows
future PMD implementations to provide recover-ability even in STATELESS
mode if they so wish. In this model STATELESS or STATEFUL would be a
hint for the PMD implementation to make optimizations for each case, but
it does not force the PMD implementation to limit functionality if it
can provide recover-ability.

[Fiona] So you're suggesting the following:
OUT_OF_SPACE - returned only on stateful operation. Not an error. Op.produced
    can be used and next op in stream should continue on from op.consumed+1.
OUT_OF_SPACE_TERMINATED - returned only on stateless operation.
    Error condition, no recovery possible.
    consumed=produced=0. Application must resubmit all input data with
    a bigger output buffer.
OUT_OF_SPACE_RECOVERABLE - returned only on stateless operation, some recovery possible.
     - consumed = 0, produced > 0. Application must resubmit all input data with
        a bigger output buffer. However in decompression case, data up to produced
        in dst buffer may be inspected/searched. Never happens in compression
        case as output data would be meaningless.
     - consumed > 0, produced > 0. PMD has stored relevant state and history and so
        can convert to stateful, using op.produced and continuing from consumed+1.
I don't expect our PMDs to use this last case, but maybe this works for others?
I'm not convinced it's not just adding complexity. It sounds like a version of stateful
without a stream, and maybe less efficient?
If so should it respect the FLUSH flag? Which would have been FULL or FINAL in the op.
Or treat it as FLUSH_NONE or SYNC? I don't know why an application would not
simply have submitted a STATEFUL request if this is the behaviour it wants?

[Ahmed] I was actually suggesting the removal of OUT_OF_SPACE entirely
and replacing it with
OUT_OF_SPACE_TERMINATED - returned only on stateless operation.
       Error condition, no recovery possible.
       - consumed=0 produced=amount of data produced. Application must
resubmit all input data with
         a bigger output buffer to process all of the op
OUT_OF_SPACE_RECOVERABLE -  Normally returned on stateful operation. Not
an error. Op.produced
   can be used and next op in stream should continue on from op.consumed+1.
       -  consumed > 0, produced > 0. PMD has stored relevant state and
history and so
           can continue using op.produced and continuing from consumed+1.

We would not return OUT_OF_SPACE_RECOVERABLE in stateless mode in our
implementation either.

Regardless of speculative future PMDs. The more important aspect of this
for today is that the return status clearly determines
the meaning of "consumed". If it is RECOVERABLE then consumed is
meaningful. if it is TERMINATED then consumed in meaningless.
This way we take away the ambiguity of having OUT_OF_SPACE mean two
different user work flows.

A speculative future PMD may be designed to return RECOVERABLE for
stateless ops that are attached to streams.
A future PMD may look to see if an op has a stream is attached and write
out the state there and go into recoverable mode.
in essence this leaves the choice up to the implementation and allows
the PMD to take advantage of stateless optimizations
so long as a "RECOVERABLE" scenario is rarely hit. The PMD will dump
context as soon as it fully processes an op. It will only
write context out in cases where the op chokes.
This futuristic PMD should ignore the FLUSH since this STATELESS mode as
indicated by the user and optimize

[Shally] IMO, it looks okay to have two separate return code TERMINATED and RECOVERABLE with
definition as you mentioned and seem doable.
So then it mean all following conditions:
a. stateless with flush = full/final, no stream pointer provided , PMD can return TERMINATED i.e.

user

quoted

has to start all over again, it's a failure (as in current definition)
b. stateless with flush = full/final, stream pointer provided, here it's up to PMD to return either
TERMINATED or RECOVERABLE depending upon its ability (note if Recoverable, then PMD will

maintain

quoted

states in stream pointer)
c. stateful with flush = full / NO_SYNC, stream pointer always there, PMD will
TERMINATED/RECOVERABLE depending on STATEFUL_COMPRESSION/DECOMPRESSION feature

flag

quoted

enabled or not

[Fiona] I don't think the flush flag is relevant - it could be out of space on any flush flag, and if out of

space

quoted

should ignore the flush flag.
Is there a need for TERMINATED? - I didn't think it would ever need to be returned in stateful case.
 Why the ref to feature flag? If a PMD doesn't support a feature I think it should fail the op - not with
 out-of space, but unsupported or similar. Or it would fail on stream creation.

[Ahmed] Agreed with Fiona. The flush flag only matters on success. By
definition the PMD should return OUT_OF_SPACE_RECOVERABLE in stateful
mode when it runs out of space.
@Shally If the user did not provide a stream, then the PMD should
probably return TERMINATED every time. I am not sure we should make a
"really smart" PMD which returns RECOVERABLE even if no stream pointer
was given. In that case the PMD must give some ID back to the caller
that the caller can use to "recover" the op. I am not sure how it would
be implemented in the PMD and when does the PMD decide to retire streams
belonging to dead ops that the caller decided not to "recover".

quoted

and one more exception case is:
d. stateless with flush = full, no stream pointer provided, PMD can return RECOVERABLE i.e. PMD
internally maintained that state somehow and consumed & produced > 0, so user can start

consumed+1

quoted

but there's restriction on user not to alter or change op until it is fully processed?!

[Fiona] Why the need for this case?
There's always a restriction on user not to alter or change op until it is fully processed.
If a PMD can do this - why doesn't it create a stream when that API is called - and then it's same as b?

[Ahmed] Agreed. The user should not touch an op once enqueued until they
receive it in dequeue. We ignore the flush in stateless mode. We assume
it to be final every time.

[Shally] Agreed and am not in favour of supporting such implementation either. Just listed out different
possibilities up here to better visualise Ahmed requirements/applicability of TERMINATED and
RECOVERABLE.

quoted

API currently takes care of case a and c, and case b can be supported if specification accept another
proposal which mention optional usage of stream with stateless.

[Fiona] API has this, but as we agreed, not optional to call the create_stream() with an op_type
parameter (stateful/stateless). PMD can return NULL or provide a stream, if the latter then that
stream must be attached to ops.

 Until then API takes no difference to

quoted

case b and c i.e. we can have op such as,
- type= stateful with flush = full/final, stream pointer provided, PMD can return
TERMINATED/RECOVERABLE according to its ability

Case d , is something exceptional, if there's requirement in PMDs to support it, then believe it will be
doable with concept of different return code.

[Fiona] That's not quite how I understood it. Can it be simpler and only following cases?
a. stateless with flush = full/final, no stream pointer provided , PMD can return TERMINATED i.e. user
    has to start all over again, it's a failure (as in current definition).
    consumed = 0, produced=amount of data produced. This is usually 0, but in decompression
    case a PMD may return > 0 and application may find it useful to inspect that data.
b. stateless with flush = full/final, stream pointer provided, here it's up to PMD to return either
    TERMINATED or RECOVERABLE depending upon its ability (note if Recoverable, then PMD will

maintain

quoted

    states in stream pointer)
c. stateful with flush = any, stream pointer always there, PMD will return RECOVERABLE.
    op.produced can be used and next op in stream should continue on from op.consumed+1.
    Consumed=0, produced=0 is an unusual but allowed case. I'm not sure if it could ever happen, but
    no need to change state to TERMINATED in this case. There may be useful state/history
    stored in the PMD, even though no output produced yet.

[Ahmed] Agreed

[Shally] Sounds good.

quoted

D.2 Compression API Stateful operation
----------------------------------------------------------
 A Stateful operation in DPDK compression means application invokes

enqueue burst() multiple times to process related chunk of data either
because

quoted

- Application broke data into several ops, and/or
- PMD ran into out_of_space situation during input processing

In case of either one or all of the above conditions, PMD is required to

maintain state of op across enque_burst() calls and

quoted

ops are setup with op_type RTE_COMP_OP_STATEFUL, and begin with

flush value = RTE_COMP_NO/SYNC_FLUSH and end at flush value
RTE_COMP_FULL/FINAL_FLUSH.

quoted

D.2.1 Stateful operation state maintenance
---------------------------------------------------------------
It is always an ideal expectation from application that it should parse

through all related chunk of source data making its mbuf-chain and

enqueue

quoted

it for stateless processing.

quoted

However, if it need to break it into several enqueue_burst() calls, then

an

quoted

expected call flow would be something like:

quoted

enqueue_burst( |op.no_flush |)

[Ahmed] The work is now in flight to the PMD.The user will call dequeue
burst in a loop until all ops are received. Is this correct?

quoted

deque_burst(op) // should dequeue before we enqueue next

[Shally] Yes. Ideally every submitted op need to be dequeued. However

this illustration is specifically in

quoted

context of stateful op processing to reflect if a stream is broken into

chunks, then each chunk should be

quoted

submitted as one op at-a-time with type = STATEFUL and need to be

dequeued first before next chunk is

quoted

enqueued.

quoted

enqueue_burst( |op.no_flush |)
deque_burst(op) // should dequeue before we enqueue next
enqueue_burst( |op.full_flush |)

[Ahmed] Why now allow multiple work items in flight? I understand that
occasionaly there will be OUT_OF_SPACE exception. Can we just

distinguish

quoted

the response in exception cases?

[Shally] Multiples ops are allowed in flight, however condition is each op in

such case is independent of

quoted

each other i.e. belong to different streams altogether.
Earlier (as part of RFC v1 doc) we did consider the proposal to process all

related chunks of data in single

quoted

burst by passing them as ops array but later found that as not-so-useful for

PMD handling for various

quoted

reasons. You may please refer to RFC v1 doc review comments for same.

[Fiona] Agree with Shally. In summary, as only one op can be processed at a
time, since each needs the
state of the previous, to allow more than 1 op to be in-flight at a time would
force PMDs to implement internal queueing and exception handling for
OUT_OF_SPACE conditions you mention.

[Ahmed] But we are putting the ops on qps which would make them
sequential. Handling OUT_OF_SPACE conditions would be a little bit more
complex but doable.

[Fiona] In my opinion this is not doable, could be very inefficient.
There may be many streams.
The PMD would have to have an internal queue per stream so
it could adjust the next src offset and length in the OUT_OF_SPACE case.
And this may ripple back though all subsequent ops in the stream as each
source len is increased and its dst buffer is not big enough.

[Ahmed] Regarding multi op OUT_OF_SPACE handling.
The caller would still need to adjust
the src length/output buffer as you say. The PMD cannot handle
OUT_OF_SPACE internally.
After OUT_OF_SPACE occurs, the PMD should reject all ops in this stream
until it gets explicit
confirmation from the caller to continue working on this stream. Any ops
received by
the PMD should be returned to the caller with status STREAM_PAUSED since
the caller did not
explicitly acknowledge that it has solved the OUT_OF_SPACE issue.
These semantics can be enabled by adding a new function to the API
perhaps stream_resume().
This allows the caller to indicate that it acknowledges that it has seen
the issue and this op
should be used to resolve the issue. Implementations that do not support
this mode of use
can push back immediately after one op is in flight. Implementations
that support this use
mode can allow many ops from the same session

[Shally] Is it still in context of having single burst where all op belongs to one stream? If yes, I would

still

quoted

say it would add an overhead to PMDs especially if it is expected to work closer to HW (which I think

is

quoted

the case with DPDK PMD).
Though your approach is doable but why this all cannot be in a layer above PMD? i.e. a layer above

PMD

quoted

can either pass one-op at a time with burst size = 1 OR can make chained mbuf of input and output

and

quoted

pass than as one op.
Is it just to ease applications of chained mbuf burden or do you see any performance /use-case
impacting aspect also?

if it is in context where each op belong to different stream in a burst, then why do we need
stream_pause and resume? It is a expectations from app to pass more output buffer with consumed

+ 1

quoted

from next call onwards as it has already
seen OUT_OF_SPACE.

[Ahmed] Yes, this would add extra overhead to the PMD. Our PMD
implementation rejects all ops that belong to a stream that has entered
"RECOVERABLE" state for one reason or another. The caller must
acknowledge explicitly that it has received news of the problem before
the PMD allows this stream to exit "RECOVERABLE" state. I agree with you
that implementing this functionality in the software layer above the PMD
is a bad idea since the latency reductions are lost.

[Shally] Just reiterating, I rather meant other way around i.e. I see it easier to put all such complexity in a
layer above PMD.

quoted

This setup is useful in latency sensitive applications where the latency
of buffering multiple ops into one op is significant. We found latency
makes a significant difference in search applications where the PMD
competes with software decompression.

[Fiona] I see, so when all goes well, you get best-case latency, but when
out-of-space occurs latency will probably be worse.

[Ahmed] This is exactly right. This use mode assumes out-of-space is a
rare occurrence. Recovering from it should take similar time to
synchronous implementations. The caller gets OUT_OF_SPACE_RECOVERABLE in
both sync and async use. The caller can fix up the op and send it back
to the PMD to continue work just as would be done in sync. Nonetheless,
the added complexity is not justifiable if out-of-space is very common
since the recoverable state will be the limiting factor that forces
synchronicity.

quoted

[Fiona] I still have concerns with this and would not want to support in our PMD.
TO make sure I understand, you want to send a burst of ops, with several from same stream.
If one causes OUT_OF_SPACE_RECOVERABLE, then the PMD should not process any
subsequent ops in that stream.
Should it return them in a dequeue_burst() with status still NOT_PROCESSED?
Or somehow drop them? How?
While still processing ops form other streams.

[Ahmed] This is exactly correct. It should return them with
NOT_PROCESSED. Yes, the PMD should continue processing other streams.

quoted

As we want to offload each op to hardware with as little CPU processing as possible we
would not want to open up each op to see which stream it's attached to and
make decisions to do per-stream storage, or drop it, or bypass hw and dequeue without processing.

[Ahmed] I think I might have missed your point here, but I will try to
answer. There is no need to "cushion" ops in DPDK. DPDK should send ops
to the PMD and the PMD should reject until stream_continue() is called.
The next op to be sent by the user will have a special marker in it to
inform the PMD to continue working on this stream. Alternatively the
DPDK layer can be made "smarter" to fail during the enqueue by checking
the stream and its state, but like you say this adds additional CPU
overhead during the enqueue.
I am curious. In a simple synchronous use case. How do we prevent users
from putting multiple ops in flight that belong to a single stream? Do
we just currently say it is undefined behavior? Otherwise we would have
to check the stream and incur the CPU overhead.

[Fiona] We don't do anything to prevent it. It's undefined. IMO on data path in
DPDK model we expect good behaviour and don't have to error check for things like this.

[Ahmed] This makes sense. We also assume good behavior.

quoted

In our PMD if we got a burst of 20 ops, we allocate 20 spaces on the hw q, then
build and send those messages. If we found an op from a stream which already
had one inflight, we'd have to hold that back, store in a sw stream-specific holding queue,
only send 19 to hw. We cannot send multiple ops from same stream to
the hw as it fans them out and does them in parallel.
Once the enqueue_burst() returns, there is no processing
context which would spot that the first has completed
and send the next op to the hw. On a dequeue_burst() we would spot this,
in that context could process the next op in the stream.
On out of space, instead of processing the next op we would have to transfer
all unprocessed ops from the stream to the dequeue result.
Some parts of this are doable, but seems likely to add a lot more latency,
we'd need to add extra threads and timers to move ops from the sw
queue to the hw q to get any benefit, and these constructs would add
context switching and CPU cycles. So we prefer to push this responsibility
to above the API and it can achieve similar.

[Ahmed] I see what you mean. Our workflow is almost exactly the same
with our hardware, but the fanning out is done by the hardware based on
the stream and ops that belong to the same stream are never allowed to
go out of order. Otherwise the data would be corrupted. Likewise the
hardware is responsible for checking the state of the stream and
returning frames as NOT_PROCESSED to the software

quoted

Maybe we could add a capability if this behaviour is important for you?
e.g. ALLOW_ENQUEUE_MULTIPLE_STATEFUL_OPS ?
Our PMD would set this to 0. And expect no more than one op from a stateful stream
to be in flight at any time.

[Ahmed] That makes sense. This way the different DPDK implementations do
not have to add extra checking for unsupported cases.

[Shally] @ahmed, If I summarise your use-case, this is how to want to PMD to support?
- a burst *carry only one stream* and all ops then assumed to be belong to that stream? (please note,
here burst is not carrying more than one stream)

[Ahmed] No. In this use case the caller sets up an op and enqueues a
single op. Then before the response comes back from the PMD the caller
enqueues a second op on the same stream.

quoted

-PMD will submit one op at a time to HW?

[Ahmed] I misunderstood what PMD means. I used it throughout to mean the
HW. I used DPDK to mean the software implementation that talks to the
hardware.
The software will submit all ops immediately. The hardware has to figure
out what to do with the ops depending on what stream they belong to.

quoted

-if processed successfully, push it back to completion queue with status = SUCCESS. If failed or run to
into OUT_OF_SPACE, then push it to completion queue with status = FAILURE/
OUT_OF_SPACE_RECOVERABLE and rest with status = NOT_PROCESSED and return with enqueue count
= total # of ops submitted originally with burst?

[Ahmed] This is exactly what I had in mind. all ops will be submitted to
the HW. The HW will put all of them on the completion queue with the
correct status exactly as you say.

quoted

-app assumes all have been enqueued, so it go and dequeue all ops
-on seeing an op with OUT_OF_SPACE_RECOVERABLE, app resubmit a burst of ops with call to
stream_continue/resume API starting from op which encountered OUT_OF_SPACE and others as
NOT_PROCESSED with updated input and output buffer?

[Ahmed] Correct this is what we do today in our proprietary API.

quoted

-repeat until *all* are dequeued with status = SUCCESS or *any* with status = FAILURE? If anytime
failure is seen, then app start whole processing all over again or just drop this burst?!

[Ahmed] The app has the choice on how to proceed. If the issue is
recoverable then the application can continue this stream from where it
stopped. if the failure is unrecoverable then the application should
first fix the problem and start from the beginning of the stream.

quoted

If all of above is true, then I think we should add another API such as rte_comp_enque_single_stream()
which will be functional under Feature Flag = ALLOW_ENQUEUE_MULTIPLE_STATEFUL_OPS or better
name is SUPPORT_ENQUEUE_SINGLE_STREAM?!

[Ahmed] The main advantage in async use is lost if we force all related
ops to be in the same burst. if we do that, then we might as well merge
all the ops into one op. That would reduce the overhead.
The use mode I am proposing is only useful in cases where the data
becomes available after the first enqueue occurred. I want to allow the
caller to enqueue the second set of data as soon as it is available
regardless of whether or not the HW has already started working on the
first op inflight.

[Shally] @ahmed,  Ok.. seems I missed a point here. So, confirm me following:
  
As per current description in doc, expected stateful usage is:
enqueue (op1) --> dequeue(op1) --> enqueue(op2)

but you're suggesting to allow an option to change it to 

enqueue(op1) -->enqueue(op2) 

i.e.  multiple ops from same stream can be put in-flight via subsequent enqueue_burst() calls without waiting to dequeue previous ones as PMD support it . So, no change to current definition of a burst. It will still carry multiple streams where each op belonging to different stream ?!
if yes, then seems your HW can be setup for multiple streams so it is efficient for your case to support it  in DPDK PMD layer but our hw doesn't by-default and need SW to back it. Given that, I also suggest to enable it under some feature flag.

However it looks like an add-on and if it doesn't change current definition of a burst and minimum expectation set on stateful processing described in this document, then IMO, you can propose this feature as an incremental patch on baseline version, in absence of which, 
application will exercise stateful processing as described here (enq->deq->enq). Thoughts?

quoted

[Fiona] Am curious about Ahmed's response to this. I didn't get that a burst should carry only one stream
Or get how this makes a difference? As there can be many enqueue_burst() calls done before an dequeue_burst()
Maybe you're thinking the enqueue_burst() would be a blocking call that would not return until all the ops
had been processed? This would turn it into a synchronous call which isn't the intent.

[Ahmed] Agreed, a blocking or even a buffering software layer that baby
sits the hardware does not fundamentally change the parameters of the
system as a whole. It just moves workflow management complexity down
into the DPDK software layer. Rather there are real latency and
throughput advantages (because of caching) that I want to expose.

/// snip ///

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help