Thread (266 messages) 266 messages, 14 authors, 2021-11-05

Re: [dpdk-dev] [PATCH v1] ethdev: introduce shared Rx queue

From: Xueming(Steven) Li <hidden>
Date: 2021-08-12 14:35:29

-----Original Message-----
From: dev <redacted> On Behalf Of Xueming(Steven) Li
Sent: Wednesday, August 11, 2021 8:59 PM
To: Ferruh Yigit <redacted>; Jerin Jacob <redacted>
Cc: dpdk-dev <redacted>; NBU-Contact-Thomas Monjalon <redacted>; Andrew Rybchenko
[off-list ref]
Subject: Re: [dpdk-dev] [PATCH v1] ethdev: introduce shared Rx queue


quoted
-----Original Message-----
From: Ferruh Yigit <redacted>
Sent: Wednesday, August 11, 2021 8:04 PM
To: Xueming(Steven) Li <redacted>; Jerin Jacob
[off-list ref]
Cc: dpdk-dev <redacted>; NBU-Contact-Thomas Monjalon
[off-list ref]; Andrew Rybchenko
[off-list ref]
Subject: Re: [dpdk-dev] [PATCH v1] ethdev: introduce shared Rx queue

On 8/11/2021 9:28 AM, Xueming(Steven) Li wrote:
quoted
quoted
-----Original Message-----
From: Jerin Jacob <redacted>
Sent: Wednesday, August 11, 2021 4:03 PM
To: Xueming(Steven) Li <redacted>
Cc: dpdk-dev <redacted>; Ferruh Yigit <redacted>;
NBU-Contact-Thomas Monjalon [off-list ref]; Andrew Rybchenko
[off-list ref]
Subject: Re: [dpdk-dev] [PATCH v1] ethdev: introduce shared Rx
queue

On Mon, Aug 9, 2021 at 7:46 PM Xueming(Steven) Li [off-list ref] wrote:
quoted
Hi,
quoted
-----Original Message-----
From: Jerin Jacob <redacted>
Sent: Monday, August 9, 2021 9:51 PM
To: Xueming(Steven) Li <redacted>
Cc: dpdk-dev <redacted>; Ferruh Yigit
[off-list ref]; NBU-Contact-Thomas Monjalon
[off-list ref]; Andrew Rybchenko
[off-list ref]
Subject: Re: [dpdk-dev] [PATCH v1] ethdev: introduce shared Rx
queue

On Mon, Aug 9, 2021 at 5:18 PM Xueming Li [off-list ref] wrote:
quoted
In current DPDK framework, each RX queue is pre-loaded with
mbufs for incoming packets. When number of representors scale
out in a switch domain, the memory consumption became
significant. Most important, polling all ports leads to high
cache miss, high latency and low throughput.

This patch introduces shared RX queue. Ports with same
configuration in a switch domain could share RX queue set by specifying sharing group.
Polling any queue using same shared RX queue receives packets
from all member ports. Source port is identified by mbuf->port.

Port queue number in a shared group should be identical. Queue
index is
1:1 mapped in shared group.

Share RX queue is supposed to be polled on same thread.

Multiple groups is supported by group ID.
Is this offload specific to the representor? If so can this name be changed specifically to representor?
Yes, PF and representor in switch domain could take advantage.
quoted
If it is for a generic case, how the flow ordering will be maintained?
Not quite sure that I understood your question. The control path
of is almost same as before, PF and representor port still needed, rte flows not impacted.
Queues still needed for each member port, descriptors(mbuf) will
be supplied from shared Rx queue in my PMD implementation.
My question was if create a generic RTE_ETH_RX_OFFLOAD_SHARED_RXQ
offload, multiple ethdev receive queues land into the same receive
queue, In that case, how the flow order is maintained for
respective receive queues.
quoted
I guess the question is testpmd forward stream? The forwarding logic has to be changed slightly in case of shared rxq.
basically for each packet in rx_burst result, lookup source stream according to mbuf->port, forwarding to target fs.
Packets from same source port could be grouped as a small burst to
process, this will accelerates the performance if traffic come from
limited ports. I'll introduce some common api to do shard rxq
forwarding, call it with packets handling callback, so it suites for
all
forwarding engine. Will sent patches soon.
quoted
All ports will put the packets in to the same queue (share queue),
right? Does this means only single core will poll only, what will happen if there are multiple cores polling, won't it cause problem?
This has been mentioned in commit log, the shared rxq is supposed to be polling in single thread(core) - I think it should be "MUST".
Result is unexpected if there are multiple cores pooling, that's why I added a polling schedule check in testpmd.
V2 with testpmd code uploaded, please check.
Similar for rx/tx burst function, a queue can't be polled on multiple thread(core), and for performance concern, no such check in eal
api.

If users want to utilize multiple cores to distribute workloads, it's possible to define more groups, queues in different group could be
could be polled on multiple cores.

It's possible to poll every member port in group, but not necessary, any port in group could be polled to get packets for all ports in
group.

If the member port subject to hot plug/remove,  it's possible to create a vdev with same queue number, copy rxq object and poll vdev
as a dedicate proxy for the group.
quoted
And if this requires specific changes in the application, I am not
sure about the solution, can't this work in a transparent way to the application?
Yes, we considered different options in design stage. One possible solution is to cache received packets in rings, this can be done on
eth layer, but I'm afraid less benefits, user still has to be a ware of multiple core polling.
This can be done as a wrapper PMD later, more efforts.
quoted
Overall, is this for optimizing memory for the port represontors? If
so can't we have a port representor specific solution, reducing scope can reduce the complexity it brings?
This feature supports both PF and representor, and yes, major issue is memory of representors. Poll all representors also introduces
more core cache miss latency. This feature essentially aggregates all ports in group as one port.
On the other hand, it's useful for rte flow to create offloading flows using representor as a regular port ID.

It's great if any new solution/suggestion, my head buried in PMD code :)
quoted
quoted
quoted
If this offload is only useful for representor case, Can we make
this offload specific to representor the case by changing its name and scope.
It works for both PF and representors in same switch domain, for application like OVS, few changes to apply.
quoted
quoted
quoted
quoted
Signed-off-by: Xueming Li <redacted>
---
 doc/guides/nics/features.rst                    | 11 +++++++++++
 doc/guides/nics/features/default.ini            |  1 +
 doc/guides/prog_guide/switch_representation.rst | 10 ++++++++++
 lib/ethdev/rte_ethdev.c                         |  1 +
 lib/ethdev/rte_ethdev.h                         |  7 +++++++
 5 files changed, 30 insertions(+)
diff --git a/doc/guides/nics/features.rst
b/doc/guides/nics/features.rst index a96e12d155..2e2a9b1554
100644
--- a/doc/guides/nics/features.rst
+++ b/doc/guides/nics/features.rst
@@ -624,6 +624,17 @@ Supports inner packet L4 checksum.
   ``tx_offload_capa,tx_queue_offload_capa:DEV_TX_OFFLOAD_OUTER_UDP_CKSUM``.


+.. _nic_features_shared_rx_queue:
+
+Shared Rx queue
+---------------
+
+Supports shared Rx queue for ports in same switch domain.
+
+* **[uses]     rte_eth_rxconf,rte_eth_rxmode**: ``offloads:RTE_ETH_RX_OFFLOAD_SHARED_RXQ``.
+* **[provides] mbuf**: ``mbuf.port``.
+
+
 .. _nic_features_packet_type_parsing:

 Packet type parsing
diff --git a/doc/guides/nics/features/default.ini
b/doc/guides/nics/features/default.ini
index 754184ddd4..ebeb4c1851 100644
--- a/doc/guides/nics/features/default.ini
+++ b/doc/guides/nics/features/default.ini
@@ -19,6 +19,7 @@ Free Tx mbuf on demand =
 Queue start/stop     =
 Runtime Rx queue setup =
 Runtime Tx queue setup =
+Shared Rx queue      =
 Burst mode info      =
 Power mgmt address monitor =
 MTU update           =
diff --git a/doc/guides/prog_guide/switch_representation.rst
b/doc/guides/prog_guide/switch_representation.rst
index ff6aa91c80..45bf5a3a10 100644
--- a/doc/guides/prog_guide/switch_representation.rst
+++ b/doc/guides/prog_guide/switch_representation.rst
@@ -123,6 +123,16 @@ thought as a software "patch panel" front-end for applications.
 .. [1] `Ethernet switch device driver model (switchdev)
<https://www.kernel.org/doc/Documentation/networking/switchdev.t
xt
quoted
`_
+- Memory usage of representors is huge when number of
+representor grows,
+  because PMD always allocate mbuf for each descriptor of Rx queue.
+  Polling the large number of ports brings more CPU load, cache
+miss and
+  latency. Shared Rx queue can be used to share Rx queue
+between PF and
+  representors in same switch domain.
+``RTE_ETH_RX_OFFLOAD_SHARED_RXQ``
+  is present in Rx offloading capability of device info.
+Setting the
+  offloading flag in device Rx mode or Rx queue configuration
+to enable
+  shared Rx queue. Polling any member port of shared Rx queue
+can return
+  packets of all ports in group, port ID is saved in ``mbuf.port``.
+
 Basic SR-IOV
 ------------
diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index 9d95cd11e1..1361ff759a 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -127,6 +127,7 @@ static const struct {
        RTE_RX_OFFLOAD_BIT2STR(OUTER_UDP_CKSUM),
        RTE_RX_OFFLOAD_BIT2STR(RSS_HASH),
        RTE_ETH_RX_OFFLOAD_BIT2STR(BUFFER_SPLIT),
+       RTE_ETH_RX_OFFLOAD_BIT2STR(SHARED_RXQ),
 };

 #undef RTE_RX_OFFLOAD_BIT2STR
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index d2b27c351f..a578c9db9d 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -1047,6 +1047,7 @@ struct rte_eth_rxconf {
        uint8_t rx_drop_en; /**< Drop packets if no descriptors are available. */
        uint8_t rx_deferred_start; /**< Do not start queue with rte_eth_dev_start(). */
        uint16_t rx_nseg; /**< Number of descriptions in rx_seg array.
*/
+       uint32_t shared_group; /**< Shared port group index in
+ switch domain. */
        /**
         * Per-queue Rx offloads to be set using DEV_RX_OFFLOAD_* flags.
         * Only offloads set on rx_queue_offload_capa or
rx_offload_capa @@ -1373,6 +1374,12 @@ struct rte_eth_conf {
#define DEV_RX_OFFLOAD_OUTER_UDP_CKSUM  0x00040000
 #define DEV_RX_OFFLOAD_RSS_HASH                0x00080000
 #define RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT 0x00100000
+/**
+ * Rx queue is shared among ports in same switch domain to save
+memory,
+ * avoid polling each port. Any port in group can be used to receive packets.
+ * Real source port number saved in mbuf->port field.
+ */
+#define RTE_ETH_RX_OFFLOAD_SHARED_RXQ   0x00200000

 #define DEV_RX_OFFLOAD_CHECKSUM (DEV_RX_OFFLOAD_IPV4_CKSUM | \
                                 DEV_RX_OFFLOAD_UDP_CKSUM | \
--
2.25.1
  
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help