Re: [RFC v2] doc compression API for DPDK

From: Verma, Shally <hidden>
Date: 2018-01-12 13:49:22

Hi Fiona

-----Original Message-----
From: Trahe, Fiona [mailto:fiona.trahe@intel.com]
Sent: 12 January 2018 00:24
To: Verma, Shally <redacted>; Ahmed Mansour
[off-list ref]; dev@dpdk.org
Cc: Athreya, Narayana Prasad <redacted>;
Gupta, Ashish [off-list ref]; Sahu, Sunila
[off-list ref]; De Lara Guarch, Pablo
[off-list ref]; Challa, Mahipal
[off-list ref]; Jain, Deepak K [off-list ref];
Hemant Agrawal [off-list ref]; Roy Pledge
[off-list ref]; Youri Querry [off-list ref]; Trahe,
Fiona [off-list ref]
Subject: RE: [RFC v2] doc compression API for DPDK

Hi Shally, Ahmed,

quoted

-----Original Message-----
From: Verma, Shally [mailto:Shally.Verma@cavium.com]
Sent: Wednesday, January 10, 2018 12:55 PM
To: Ahmed Mansour <redacted>; Trahe, Fiona

[off-list ref]; dev@dpdk.org

quoted

Cc: Athreya, Narayana Prasad <redacted>;

Gupta, Ashish

quoted

[off-list ref]; Sahu, Sunila [off-list ref];

De Lara Guarch, Pablo

quoted

[off-list ref]; Challa, Mahipal

[off-list ref]; Jain, Deepak K

quoted

[off-list ref]; Hemant Agrawal

[off-list ref]; Roy Pledge

quoted

[off-list ref]; Youri Querry [off-list ref]
Subject: RE: [RFC v2] doc compression API for DPDK

HI Ahmed

quoted

-----Original Message-----
From: Ahmed Mansour [mailto:ahmed.mansour@nxp.com]
Sent: 10 January 2018 00:38
To: Verma, Shally <redacted>; Trahe, Fiona
[off-list ref]; dev@dpdk.org
Cc: Athreya, Narayana Prasad <redacted>;
Gupta, Ashish [off-list ref]; Sahu, Sunila
[off-list ref]; De Lara Guarch, Pablo
[off-list ref]; Challa, Mahipal
[off-list ref]; Jain, Deepak K

[off-list ref];

quoted

Hemant Agrawal [off-list ref]; Roy Pledge
[off-list ref]; Youri Querry [off-list ref]
Subject: Re: [RFC v2] doc compression API for DPDK

Hi Shally,

Thanks for the summary. It is very helpful. Please see comments below


On 1/4/2018 6:45 AM, Verma, Shally wrote:

quoted

This is an RFC v2 document to brief understanding and requirements on

compression API proposal in DPDK. It is based on "[RFC v3] Compression

API

quoted

https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fdpd

quoted

k.org%2Fdev%2Fpatchwork%2Fpatch%2F32331%2F&data=02%7C01%7Cahm

quoted

ed.mansour%40nxp.com%7C80bd3270430c473fa71d08d55368a0e1%7C686ea

quoted

1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C636506631207323264&sdata=JF

quoted

tOnJxajgXX7s3DMZ79K7VVM7TXO8lBd6rNeVlsHDg%3D&reserved=0 ".

quoted

Intention of this document is to align on concepts built into

compression

quoted

API, its usage and identify further requirements.

quoted

Going further it could be a base to Compression Module Programmer

Guide.

quoted

Current scope is limited to
- definition of the terminology which makes up foundation of

compression

quoted

API

quoted

- typical API flow expected to use by applications
- Stateless and Stateful operation definition and usage after RFC v1 doc

review

https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fdev.

quoted

dpdk.narkive.com%2FCHS5l01B%2Fdpdk-dev-rfc-v1-doc-compression-

api-

quoted

dpdk&data=02%7C01%7Cahmed.mansour%40nxp.com%7C80bd3270430c473

quoted

fa71d08d55368a0e1%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C6

quoted

36506631207323264&sdata=Fy7xKIyxZX97i7vEM6NqgrvnqKrNrWOYLwIA5dEH

quoted

QNQ%3D&reserved=0

quoted

1. Overview
~~~~~~~~~~~

A. Compression Methodologies in compression API
===========================================
DPDK compression supports two types of compression methodologies:
- Stateless - each data object is compressed individually without any

reference to previous data,

quoted

- Stateful -  each data object is compressed with reference to previous

data

quoted

object i.e. history of data is needed for compression / decompression

quoted

For more explanation, please refer RFC

https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fw

quoted

ww.ietf.org%2Frfc%2Frfc1951.txt&data=02%7C01%7Cahmed.mansour%40nx

quoted

p.com%7C80bd3270430c473fa71d08d55368a0e1%7C686ea1d3bc2b4c6fa92cd9

quoted

9c5c301635%7C0%7C0%7C636506631207323264&sdata=pfp2VX1w3UxH5YLcL

quoted

2R%2BvKXNeS7jP46CsASq0B1SETw%3D&reserved=0

quoted

To support both methodologies, DPDK compression introduces two key

concepts: Session and Stream.

quoted

B. Notion of a session in compression API
==================================
A Session in DPDK compression is a logical entity which is setup one-

time

quoted

with immutable parameters i.e. parameters that don't change across
operations and devices.

quoted

A session can be shared across multiple devices and multiple operations

simultaneously.

quoted

A typical Session parameters includes info such as:
- compress / decompress
- compression algorithm and associated configuration parameters

Application can create different sessions on a device initialized with

same/different xforms. Once a session is initialized with one xform it

cannot

quoted

be re-initialized.

quoted

C. Notion of stream in compression API
 =======================================
Unlike session which carry common set of information across

operations, a

quoted

stream in DPDK compression is a logical entity which identify related set

of

quoted

operations and carry operation specific information as needed by device
during its processing.

quoted

It is device specific data structure which is opaque to application, setup

and

quoted

maintained by device.

quoted

A stream can be used with *only* one op at a time i.e. no two

operations

quoted

can share same stream simultaneously.

quoted

A stream is *must* for stateful ops processing and optional for

stateless

quoted

(Please see respective sections for more details).

quoted

This enables sharing of a session by multiple threads handling different

data set as each op carry its own context (internal states, history buffers

et

quoted

el) in its attached stream.

quoted

Application should call rte_comp_stream_create() and attach to op

before

quoted

beginning of  operation processing and free via rte_comp_stream_free()
after its complete.

quoted

C. Notion of burst operations in compression API
 =======================================
A burst in DPDK compression is an array of operations where each op

carry

quoted

independent set of data. i.e. a burst can look like:

quoted

                                      ----------------------------------------------------------------

-----

quoted

------------------------------------

quoted

              enque_burst (|op1.no_flush | op2.no_flush | op3.flush_final |

op4.no_flush | op5.no_flush |)

quoted

                                       ----------------------------------------------------------------

----

quoted

-------------------------------------

quoted

Where, op1 .. op5 are all independent of each other and carry entirely

different set of data.

quoted

Each op can be attached to same/different session but *must* be

attached

quoted

to different stream.

quoted

Each op (struct rte_comp_op) carry compression/decompression

operational parameter and is both an input/output parameter.

quoted

PMD gets source, destination and checksum information at input and

update it with bytes consumed and produced and checksum at output.

quoted

Since each operation in a burst is independent and thus can complete

out-

quoted

of-order,  applications which need ordering, should setup per-op user

data

quoted

area with reordering information so that it can determine enqueue order

at

quoted

deque.

quoted

Also if multiple threads calls enqueue_burst() on same queue pair then

it's

quoted

application onus to use proper locking mechanism to ensure exclusive
enqueuing of operations.

quoted

D. Stateless Vs Stateful
===================
Compression API provide RTE_COMP_FF_STATEFUL feature flag for

PMD

quoted

to reflect its support for Stateful operation. Each op carry an op type
indicating if it's to be processed stateful or stateless.

quoted

D.1 Compression API Stateless operation
------------------------------------------------------
An op is processed stateless if it has
-              flush value is set to RTE_FLUSH_FULL or RTE_FLUSH_FINAL

(required only on compression side),

quoted

-	 op_type set to RTE_COMP_OP_STATELESS
-              All-of the required input and sufficient large output buffer to

store

quoted

output i.e. OUT_OF_SPACE can never occur.

quoted

When all of the above conditions are met, PMD initiates stateless

processing and releases acquired resources after processing of current
operation is complete i.e. full input consumed and full output written.

[Fiona] I think 3rd condition conflicts with D1.1 below and anyway cannot be
a precondition. i.e.
PMD must initiate stateless processing based on RTE_COMP_OP_STATELESS.
It can't always know if the output buffer is big enough before processing, it
must process the input data and
only when it has consumed it all can it know that all the output data fits or
doesn't fit in the output buffer.

I'd suggest rewording as follows:
An op is processed statelessly if op_type is set to RTE_COMP_OP_STATELESS
In this case
- The flush value must be set to RTE_FLUSH_FULL or RTE_FLUSH_FINAL
(required only on compression side),
- All of the input data must be in the src buffer
- The dst buffer should be sufficiently large enough to hold the expected
output
The PMD acquires the necessary resources to process the op. After
processing of current operation is
complete, whether successful or not, it releases acquired resources and no
state, history or data is
held in the PMD or carried over to subsequent ops.
In SUCCESS case full input is consumed and full output written and status is
set to RTE_COMP_OP_STATUS_SUCCESS.
OUT-OF-SPACE as D1.1 below.

[Shally] Ok. Agreed.

quoted

Application can optionally attach a stream to such ops. In such case,

application must attach different stream to each op.

quoted

Application can enqueue stateless burst via making consecutive

enque_burst() calls i.e. Following is relevant usage:

quoted

enqueued = rte_comp_enque_burst (dev_id, qp_id, ops1, nb_ops);
enqueued = rte_comp_enque_burst(dev_id, qp_id, ops2, nb_ops);

*Note - Every call has different ops array i.e.  same rte_comp_op array

*cannot be re-enqueued* to process next batch of data until previous

ones

quoted

are completely processed.

quoted

D.1.1 Stateless and OUT_OF_SPACE
------------------------------------------------
OUT_OF_SPACE is a condition when output buffer runs out of space

and

quoted

where PMD still has more data to produce. If PMD run into such

condition,

quoted

then it's an error condition in stateless processing.

quoted

In such case, PMD resets itself and return with status

RTE_COMP_OP_STATUS_OUT_OF_SPACE with produced=consumed=0

i.e.

quoted

no input read, no output written.

quoted

Application can resubmit an full input with larger output buffer size.

[Ahmed] Can we add an option to allow the user to read the data that

was

quoted

produced while still reporting OUT_OF_SPACE? this is mainly useful for
decompression applications doing search.

[Shally] It is there but applicable for stateful operation type (please refer to

handling out_of_space under

quoted

"Stateful Section").
By definition, "stateless" here means that application (such as IPCOMP)

knows maximum output size

quoted

guaranteedly and ensure that uncompressed data size cannot grow more

than provided output buffer.

quoted

Such apps can submit an op with type = STATELESS and provide full input,

then PMD assume it has

quoted

sufficient input and output and thus doesn't need to maintain any contexts

after op is processed.

quoted

If application doesn't know about max output size, then it should process it

as stateful op i.e. setup op

quoted

with type = STATEFUL and attach a stream so that PMD can maintain

relevant context to handle such

quoted

condition.

[Fiona] There may be an alternative that's useful for Ahmed, while still
respecting the stateless concept.
In Stateless case where a PMD reports OUT_OF_SPACE in decompression
case
it could also return consumed=0, produced = x, where x>0. X indicates the
amount of valid data which has
 been written to the output buffer. It is not complete, but if an application
wants to search it it may be sufficient.
If the application still wants the data it must resubmit the whole input with a
bigger output buffer, and
 decompression will be repeated from the start, it
 cannot expect to continue on as the PMD has not maintained state, history
or data.
I don't think there would be any need to indicate this in capabilities, PMDs
which cannot provide this
functionality would always return produced=consumed=0, while PMDs which
can could set produced > 0.
If this works for you both, we could consider a similar case for compression.

[Shally] Sounds Fine to me. Though then in that case, consume should also be updated to actual consumed by PMD.
Setting consumed = 0 with produced > 0 doesn't correlate.

quoted

D.2 Compression API Stateful operation
----------------------------------------------------------
 A Stateful operation in DPDK compression means application invokes

enqueue burst() multiple times to process related chunk of data either
because

quoted

- Application broke data into several ops, and/or
- PMD ran into out_of_space situation during input processing

In case of either one or all of the above conditions, PMD is required to

maintain state of op across enque_burst() calls and

quoted

ops are setup with op_type RTE_COMP_OP_STATEFUL, and begin with

flush value = RTE_COMP_NO/SYNC_FLUSH and end at flush value
RTE_COMP_FULL/FINAL_FLUSH.

quoted

D.2.1 Stateful operation state maintenance
---------------------------------------------------------------
It is always an ideal expectation from application that it should parse

through all related chunk of source data making its mbuf-chain and

enqueue

quoted

it for stateless processing.

quoted

However, if it need to break it into several enqueue_burst() calls, then

an

quoted

expected call flow would be something like:

quoted

enqueue_burst( |op.no_flush |)

[Ahmed] The work is now in flight to the PMD.The user will call dequeue
burst in a loop until all ops are received. Is this correct?

quoted

deque_burst(op) // should dequeue before we enqueue next

[Shally] Yes. Ideally every submitted op need to be dequeued. However

this illustration is specifically in

quoted

context of stateful op processing to reflect if a stream is broken into

chunks, then each chunk should be

quoted

submitted as one op at-a-time with type = STATEFUL and need to be

dequeued first before next chunk is

quoted

enqueued.

quoted

enqueue_burst( |op.no_flush |)
deque_burst(op) // should dequeue before we enqueue next
enqueue_burst( |op.full_flush |)

[Ahmed] Why now allow multiple work items in flight? I understand that
occasionaly there will be OUT_OF_SPACE exception. Can we just

distinguish

quoted

the response in exception cases?

[Shally] Multiples ops are allowed in flight, however condition is each op in

such case is independent of

quoted

each other i.e. belong to different streams altogether.
Earlier (as part of RFC v1 doc) we did consider the proposal to process all

related chunks of data in single

quoted

burst by passing them as ops array but later found that as not-so-useful for

PMD handling for various

quoted

reasons. You may please refer to RFC v1 doc review comments for same.

[Fiona] Agree with Shally. In summary, as only one op can be processed at a
time, since each needs the
state of the previous, to allow more than 1 op to be in-flight at a time would
force PMDs to implement internal queueing and exception handling for
OUT_OF_SPACE conditions you mention.
If the application has all the data, it can put it into chained mbufs in a single
op rather than
multiple ops, which avoids pushing all that complexity down to the PMDs.

quoted

Here an op *must* be attached to a stream and every subsequent

enqueue_burst() call should carry *same* stream. Since PMD maintain

ops

quoted

state in stream, thus it is mandatory for application to attach stream to

such

quoted

[Fiona] I think you're referring only to a single stream above, but as there
may be many different streams,
maybe add the following?
Above is simplified to show just a single stream. However there may be
many streams, and each
enqueue_burst() may contain ops from different streams, as long as there is
only one op in-flight from any
stream at a given time.

[Shally] Ok get it.

quoted

D.2.2 Stateful and Out_of_Space
--------------------------------------------
If PMD support stateful and run into OUT_OF_SPACE situation, then it is

not an error condition for PMD. In such case, PMD return with status
RTE_COMP_OP_STATUS_OUT_OF_SPACE with consumed = number of

input

quoted

bytes read and produced = length of complete output buffer.

[Fiona] - produced would be <= output buffer len (typically =, but could be a
few bytes less)

quoted

Application should enqueue op with source starting at consumed+1 and

output buffer with available space.

[Ahmed] Related to OUT_OF_SPACE. What status does the user recieve

in a

quoted

decompression case when the end block is encountered before the end

of

quoted

the input? Does the PMD continue decomp? Does it stop there and

return

quoted

the stop index?

[Shally] Before I could answer this, please help me understand your use

case . When you say  "when the

quoted

end block is encountered before the end of the input?" Do you mean -
"Decompressor process a final block (i.e. has BFINAL=1 in its header) and

there's some footer data after

quoted

that?" Or
you mean "decompressor process one block and has more to process till its

final block?"

quoted

What is "end block" and "end of input" reference here?

quoted

D.2.3 Sliding Window Size
------------------------------------
Every PMD will reflect in its algorithm capability structure maximum

length

quoted

of Sliding Window in bytes which would indicate maximum history buffer
length used by algo.

quoted

2. Example API illustration
~~~~~~~~~~~~~~~~~~~~~~~

[Fiona] I think it would be useful to show an example of both a STATELESS
flow and a STATEFUL flow.

[Shally] Ok. I can add simplified version to illustrate API usage in both cases.

quoted

Following is an illustration on API usage  (This is just one flow, other

variants

quoted

are also possible):

quoted

1. rte_comp_session *sess = rte_compressdev_session_create

(rte_mempool *pool);

quoted

2. rte_compressdev_session_init (int dev_id, rte_comp_session *sess,

rte_comp_xform *xform, rte_mempool *sess_pool);

quoted

3. rte_comp_op_pool_create(rte_mempool ..)
4. rte_comp_op_bulk_alloc (struct rte_mempool *mempool, struct

rte_comp_op **ops, uint16_t nb_ops);

quoted

5. for every rte_comp_op in ops[],
    5.1 rte_comp_op_attach_session (rte_comp_op *op,

rte_comp_session

quoted

*sess);

quoted

    5.2 op.op_type = RTE_COMP_OP_STATELESS
    5.3 op.flush = RTE_FLUSH_FINAL
6. [Optional] for every rte_comp_op in ops[],
    6.1 rte_comp_stream_create(int dev_id, rte_comp_session *sess,

void

quoted

**stream);

quoted

    6.2 rte_comp_op_attach_stream(rte_comp_op *op,

rte_comp_session

quoted

*stream);

[Ahmed] What is the semantic effect of attaching a stream to every op?

will

quoted

this application benefit for this given that it is setup with op_type

STATELESS

quoted

[Shally] By role, stream is data structure that hold all information that PMD

need to maintain for an op

quoted

processing and thus it's marked device specific. It is required for stateful

processing but optional for

quoted

statelss as PMD doesn't need to maintain context once op is processed

unlike stateful.

quoted

It may be of advantage to use stream for stateless to some of the PMD.

They can be designed to do one-

quoted

time per op setup (such as mapping session params) during

stream_create() in control path than data

quoted

path.

[Fiona] yes, we agreed that stream_create() should be called for every
session and if it
returns non-NULL the PMD needs it so op_attach_stream() must be called.
However I've just realised we don't have a STATEFUL/STATELESS param on
the xform, just on the op.
So we could either add stateful/stateless param to stream_create() ?
OR add stateful/stateless param to xform so it would be in session?

[Shally] No it shouldn't be as part of session or xform as sessions aren't stateless/stateful.
So, we shouldn't alter the current definition of session or xforms.
If we need to mention it, then it could be added as part of stream_create() as it's device specific and depending upon op_type() device can then setup stream resources.

However, Shally, can you reconsider if you really need it for STATELESS or if
the data you want to
store there could be stored in the session? Or if it's needed per-op does it
really need
to be visible on the API as a stream or could it be hidden within the PMD?

[Shally] I would say it is not mandatory but a desirable feature that I am suggesting. 
I am only trying to enable optimization in data path which may be of help to some PMD designs as they can use stream_create() to do setup that are 1-time per op and regardless of op_type, such as I mentioned, setting up user session params to device sess params.
We can hide it inside PMD however there may be slight overhead in datapath depending on PMD design.
But I would say, it's not a blocker for us to freeze on current spec. We can revisit this feature later because it will not alter base API functionality.

Thanks
Shally

quoted

7.for every rte_comp_op in ops[],
     7.1 set up with src/dst buffer
8. enq = rte_compressdev_enqueue_burst (dev_id, qp_id, &ops,

nb_ops);

quoted

9. do while (dqu < enq) // Wait till all of enqueued are dequeued
    9.1 dqu = rte_compressdev_dequeue_burst (dev_id, qp_id, &ops,

enq);

quoted

[Ahmed] I am assuming that waiting for all enqueued to be dequeued is

not

quoted

strictly necessary, but is just the chosen example in this case

[Shally] Yes. By design, for burst_size>1 each op is independent of each

other. So app may proceed as soon

quoted

as it dequeue any.

quoted

10. Repeat 7 for next batch of data
11. for every ops in ops[]
      11.1 rte_comp_stream_free(op->stream);
11. rte_comp_session_clear (sess) ;
12. rte_comp_session_terminate(ret_comp_sess *session)

Thanks
Shally

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help