Thread (266 messages) 266 messages, 14 authors, 2021-11-05

Re: [dpdk-dev] [PATCH v1] ethdev: introduce shared Rx queue

From: Jerin Jacob <hidden>
Date: 2021-09-28 15:00:00

On Tue, Sep 28, 2021 at 8:10 PM Xueming(Steven) Li [off-list ref] wrote:
On Tue, 2021-09-28 at 13:59 +0000, Ananyev, Konstantin wrote:
quoted
quoted
On Tue, Sep 28, 2021 at 6:55 PM Xueming(Steven) Li
[off-list ref] wrote:
quoted
On Tue, 2021-09-28 at 18:28 +0530, Jerin Jacob wrote:
quoted
On Tue, Sep 28, 2021 at 5:07 PM Xueming(Steven) Li
[off-list ref] wrote:
quoted
On Tue, 2021-09-28 at 15:05 +0530, Jerin Jacob wrote:
quoted
On Sun, Sep 26, 2021 at 11:06 AM Xueming(Steven) Li
[off-list ref] wrote:
quoted
On Wed, 2021-08-11 at 13:04 +0100, Ferruh Yigit wrote:
quoted
On 8/11/2021 9:28 AM, Xueming(Steven) Li wrote:
quoted
quoted
-----Original Message-----
From: Jerin Jacob <redacted>
Sent: Wednesday, August 11, 2021 4:03 PM
To: Xueming(Steven) Li <redacted>
Cc: dpdk-dev <redacted>; Ferruh Yigit
[off-list ref]; NBU-Contact-Thomas
Monjalon
[off-list ref];
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
Andrew Rybchenko [off-list ref]
Subject: Re: [dpdk-dev] [PATCH v1] ethdev:
introduce shared Rx queue

On Mon, Aug 9, 2021 at 7:46 PM Xueming(Steven) Li
[off-list ref] wrote:
quoted
Hi,
quoted
-----Original Message-----
From: Jerin Jacob <redacted>
Sent: Monday, August 9, 2021 9:51 PM
To: Xueming(Steven) Li <redacted>
Cc: dpdk-dev <redacted>; Ferruh Yigit
[off-list ref];
NBU-Contact-Thomas Monjalon
[off-list ref]; Andrew Rybchenko
[off-list ref]
Subject: Re: [dpdk-dev] [PATCH v1] ethdev:
introduce shared Rx queue

On Mon, Aug 9, 2021 at 5:18 PM Xueming Li
[off-list ref] wrote:
quoted
In current DPDK framework, each RX queue is
pre-loaded with mbufs
for incoming packets. When number of
representors scale out in a
switch domain, the memory consumption became
significant. Most
important, polling all ports leads to high
cache miss, high
latency and low throughput.

This patch introduces shared RX queue. Ports
with same
configuration in a switch domain could share
RX queue set by specifying sharing group.
Polling any queue using same shared RX queue
receives packets from
all member ports. Source port is identified
by mbuf->port.

Port queue number in a shared group should be
identical. Queue
index is
1:1 mapped in shared group.

Share RX queue is supposed to be polled on
same thread.

Multiple groups is supported by group ID.
Is this offload specific to the representor? If
so can this name be changed specifically to
representor?
Yes, PF and representor in switch domain could
take advantage.
quoted
If it is for a generic case, how the flow
ordering will be maintained?
Not quite sure that I understood your question.
The control path of is
almost same as before, PF and representor port
still needed, rte flows not impacted.
Queues still needed for each member port,
descriptors(mbuf) will be
supplied from shared Rx queue in my PMD
implementation.
My question was if create a generic
RTE_ETH_RX_OFFLOAD_SHARED_RXQ offload, multiple
ethdev receive queues land into
the same
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
receive queue, In that case, how the flow order is
maintained for respective receive queues.
I guess the question is testpmd forward stream? The
forwarding logic has to be changed slightly in case
of shared rxq.
basically for each packet in rx_burst result, lookup
source stream according to mbuf->port, forwarding to
target fs.
Packets from same source port could be grouped as a
small burst to process, this will accelerates the
performance if traffic
come from
quoted
quoted
quoted
quoted
quoted
quoted
quoted
limited ports. I'll introduce some common api to do
shard rxq forwarding, call it with packets handling
callback, so it suites for
all forwarding engine. Will sent patches soon.
All ports will put the packets in to the same queue
(share queue), right? Does
this means only single core will poll only, what will
happen if there are
multiple cores polling, won't it cause problem?

And if this requires specific changes in the
application, I am not sure about
the solution, can't this work in a transparent way to
the application?
Discussed with Jerin, new API introduced in v3 2/8 that
aggregate ports
in same group into one new port. Users could schedule
polling on the
aggregated port instead of all member ports.
The v3 still has testpmd changes in fastpath. Right? IMO,
For this
feature, we should not change fastpath of testpmd
application. Instead, testpmd can use aggregated ports
probably as
separate fwd_engine to show how to use this feature.
Good point to discuss :) There are two strategies to polling
a shared
Rxq:
1. polling each member port
   All forwarding engines can be reused to work as before.
   My testpmd patches are efforts towards this direction.
   Does your PMD support this?
Not unfortunately. More than that, every application needs to
change
to support this model.
Both strategies need user application to resolve port ID from
mbuf and
process accordingly.
This one doesn't demand aggregated port, no polling schedule
change.
I was thinking, mbuf will be updated from driver/aggregator port as
when it
comes to application.
quoted
quoted
quoted
2. polling aggregated port
   Besides forwarding engine, need more work to to demo it.
   This is an optional API, not supported by my PMD yet.
We are thinking of implementing this PMD when it comes to it,
ie.
without application change in fastpath
logic.
Fastpath have to resolve port ID anyway and forwarding according
to
logic. Forwarding engine need to adapt to support shard Rxq.
Fortunately, in testpmd, this can be done with an abstract API.

Let's defer part 2 until some PMD really support it and tested,
how do
you think?
We are not planning to use this feature so either way it is OK to
me.
I leave to ethdev maintainers decide between 1 vs 2.

I do have a strong opinion not changing the testpmd basic forward
engines
for this feature.I would like to keep it simple as fastpath
optimized and would
like to add a separate Forwarding engine as means to verify this
feature.
+1 to that.
I don't think it a 'common' feature.
So separate FWD mode seems like a best choice to me.
-1 :)
There was some internal requirement from test team, they need to verify
Internal QA requirements may not be the driving factor :-)
all features like packet content, rss, vlan, checksum, rte_flow... to
be working based on shared rx queue. Based on the patch, I believe the
impact has been minimized.
quoted
quoted

quoted
quoted
quoted
quoted
quoted
quoted
Overall, is this for optimizing memory for the port
represontors? If so can't we
have a port representor specific solution, reducing
scope can reduce the
complexity it brings?
quoted
quoted
If this offload is only useful for representor
case, Can we make this offload specific to
representor the case by changing its
name and
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
scope.
It works for both PF and representors in same switch
domain, for application like OVS, few changes to
apply.
quoted
quoted
quoted
quoted
Signed-off-by: Xueming Li
[off-list ref]
---
 doc/guides/nics/features.rst
| 11 +++++++++++
 doc/guides/nics/features/default.ini
|  1 +
 doc/guides/prog_guide/switch_representation.
rst | 10 ++++++++++
 lib/ethdev/rte_ethdev.c
|  1 +
 lib/ethdev/rte_ethdev.h
|  7 +++++++
 5 files changed, 30 insertions(+)
diff --git a/doc/guides/nics/features.rst
b/doc/guides/nics/features.rst index
a96e12d155..2e2a9b1554 100644
--- a/doc/guides/nics/features.rst
+++ b/doc/guides/nics/features.rst
@@ -624,6 +624,17 @@ Supports inner packet L4
checksum.
   ``tx_offload_capa,tx_queue_offload_capa:DE
V_TX_OFFLOAD_OUTER_UDP_CKSUM``.


+.. _nic_features_shared_rx_queue:
+
+Shared Rx queue
+---------------
+
+Supports shared Rx queue for ports in same
switch domain.
+
+* **[uses]
rte_eth_rxconf,rte_eth_rxmode**:
``offloads:RTE_ETH_RX_OFFLOAD_SHARED_RXQ``.
+* **[provides] mbuf**: ``mbuf.port``.
+
+
 .. _nic_features_packet_type_parsing:

 Packet type parsing
diff --git
a/doc/guides/nics/features/default.ini
b/doc/guides/nics/features/default.ini
index 754184ddd4..ebeb4c1851 100644
--- a/doc/guides/nics/features/default.ini
+++ b/doc/guides/nics/features/default.ini
@@ -19,6 +19,7 @@ Free Tx mbuf on demand =
 Queue start/stop     =
 Runtime Rx queue setup =
 Runtime Tx queue setup =
+Shared Rx queue      =
 Burst mode info      =
 Power mgmt address monitor =
 MTU update           =
diff --git
a/doc/guides/prog_guide/switch_representation
.rst
b/doc/guides/prog_guide/switch_representation
.rst
index ff6aa91c80..45bf5a3a10 100644
---
a/doc/guides/prog_guide/switch_representation
.rst
+++
b/doc/guides/prog_guide/switch_representation
.rst
@@ -123,6 +123,16 @@ thought as a software
"patch panel" front-end for applications.
 .. [1] `Ethernet switch device driver model
(switchdev)

<https://www.kernel.org/doc/Documentation/net
working/switchdev.txt
quoted
`_
+- Memory usage of representors is huge when
number of representor
+grows,
+  because PMD always allocate mbuf for each
descriptor of Rx queue.
+  Polling the large number of ports brings
more CPU load, cache
+miss and
+  latency. Shared Rx queue can be used to
share Rx queue between
+PF and
+  representors in same switch domain.
+``RTE_ETH_RX_OFFLOAD_SHARED_RXQ``
+  is present in Rx offloading capability of
device info. Setting
+the
+  offloading flag in device Rx mode or Rx
queue configuration to
+enable
+  shared Rx queue. Polling any member port
of shared Rx queue can
+return
+  packets of all ports in group, port ID is
saved in ``mbuf.port``.
+
 Basic SR-IOV
 ------------
diff --git a/lib/ethdev/rte_ethdev.c
b/lib/ethdev/rte_ethdev.c
index 9d95cd11e1..1361ff759a 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -127,6 +127,7 @@ static const struct {
        RTE_RX_OFFLOAD_BIT2STR(OUTER_UDP_CKSU
M),
        RTE_RX_OFFLOAD_BIT2STR(RSS_HASH),
        RTE_ETH_RX_OFFLOAD_BIT2STR(BUFFER_SPL
IT),
+
RTE_ETH_RX_OFFLOAD_BIT2STR(SHARED_RXQ),
 };

 #undef RTE_RX_OFFLOAD_BIT2STR
diff --git a/lib/ethdev/rte_ethdev.h
b/lib/ethdev/rte_ethdev.h
index d2b27c351f..a578c9db9d 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -1047,6 +1047,7 @@ struct rte_eth_rxconf {
        uint8_t rx_drop_en; /**< Drop packets
if no descriptors are available. */
        uint8_t rx_deferred_start; /**< Do
not start queue with rte_eth_dev_start(). */
        uint16_t rx_nseg; /**< Number of
descriptions in rx_seg array.
*/
+       uint32_t shared_group; /**< Shared
port group index in
+ switch domain. */
        /**
         * Per-queue Rx offloads to be set
using DEV_RX_OFFLOAD_* flags.
         * Only offloads set on
rx_queue_offload_capa or
rx_offload_capa @@ -1373,6 +1374,12 @@ struct
rte_eth_conf {
#define DEV_RX_OFFLOAD_OUTER_UDP_CKSUM
0x00040000
 #define DEV_RX_OFFLOAD_RSS_HASH
0x00080000
 #define RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT
0x00100000
+/**
+ * Rx queue is shared among ports in same
switch domain to save
+memory,
+ * avoid polling each port. Any port in
group can be used to receive packets.
+ * Real source port number saved in mbuf-
quoted
port field.
+ */
+#define RTE_ETH_RX_OFFLOAD_SHARED_RXQ
0x00200000

 #define DEV_RX_OFFLOAD_CHECKSUM
(DEV_RX_OFFLOAD_IPV4_CKSUM | \
                                 DEV_RX_OFFLO
AD_UDP_CKSUM | \
--
2.25.1
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help