Thread (50 messages) 50 messages, 7 authors, 2021-07-21

Re: [dpdk-dev] [PATCH v5 3/4] vhost: support async dequeue for split ring

From: Hu, Jiayu <hidden>
Date: 2021-07-16 01:11:03

Hi, Maxime,
-----Original Message-----
From: Maxime Coquelin <redacted>
Sent: Thursday, July 15, 2021 9:18 PM
To: Hu, Jiayu <redacted>; Ma, WenwuX <redacted>;
dev@dpdk.org
Cc: Xia, Chenbo <redacted>; Jiang, Cheng1
[off-list ref]; Wang, YuanX [off-list ref]
Subject: Re: [PATCH v5 3/4] vhost: support async dequeue for split ring



On 7/14/21 8:50 AM, Hu, Jiayu wrote:
quoted
Hi Maxime,

Thanks for your comments. Applies are inline.
quoted
-----Original Message-----
From: Maxime Coquelin <redacted>
Sent: Tuesday, July 13, 2021 10:30 PM
To: Ma, WenwuX <redacted>; dev@dpdk.org
Cc: Xia, Chenbo <redacted>; Jiang, Cheng1
[off-list ref]; Hu, Jiayu [off-list ref]; Wang, YuanX
[off-list ref]
Subject: Re: [PATCH v5 3/4] vhost: support async dequeue for split
ring
quoted
 struct async_inflight_info {
 	struct rte_mbuf *mbuf;
-	uint16_t descs; /* num of descs inflight */
+	union {
+		uint16_t descs; /* num of descs in-flight */
+		struct async_nethdr nethdr;
+	};
 	uint16_t nr_buffers; /* num of buffers inflight for packed ring */
-};
+} __rte_cache_aligned;
Does it really need to be cache aligned?
How about changing to 32-byte align? So a cacheline can hold 2 objects.
Or not forcing any alignment at all? Would there really be a performance
regression?
quoted
quoted
quoted
 /**
  *  dma channel feature bit definition @@ -193,4 +201,34 @@
__rte_experimental  uint16_t rte_vhost_poll_enqueue_completed(int
vid, uint16_t queue_id,
 		struct rte_mbuf **pkts, uint16_t count);

+/**
+ * This function tries to receive packets from the guest with
+offloading
+ * large copies to the DMA engine. Successfully dequeued packets
+are
+ * transfer completed, either by the CPU or the DMA engine, and
+they are
+ * returned in "pkts". There may be other packets that are sent
+from
+ * the guest but being transferred by the DMA engine, called
+in-flight
+ * packets. The amount of in-flight packets by now is returned in
+ * "nr_inflight". This function will return in-flight packets only
+after
+ * the DMA engine finishes transferring.
I am not sure to understand that comment. Is it still "in-flight" if
the DMA transfer is completed?
"in-flight" means packet copies are submitted to the DMA, but the DMA
hasn't completed copies.
quoted
Are we ensuring packets are not reordered with this way of working?
There is a threshold can be set by users. If set it to 0, which
presents all packet copies assigned to the DMA, the packets sent from
the guest will not be reordered.
Reordering packets is bad in my opinion. We cannot expect the user to know
that he should set the threshold to zero to have packets ordered.

Maybe we should consider not having threshold, and so have every
descriptors handled either by the CPU (sync datapath) or by the DMA (async
datapath). Doing so would simplify a lot the code, and would make
performance/latency more predictable.

I understand that we might not get the best performance for every packet
size doing that, but that may be a tradeoff we would make to have the
feature maintainable and easily useable by the user.
I understand and agree in some way. But before changing the existed design
in async enqueue and dequeue, we need more careful tests, as current design
is well validated and performance looks good. So I suggest to do it in 21.11.

Thanks,
Jiayu
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help