Thread (30 messages) 30 messages, 3 authors, 2023-07-27

Re: [PATCH net-next v3 4/4] vsock/virtio: MSG_ZEROCOPY flag support

From: "Michael S. Tsirkin" <mst@redhat.com>
Date: 2023-07-25 13:07:27
Also in: kvm, lkml, virtualization

On Tue, Jul 25, 2023 at 02:53:39PM +0200, Stefano Garzarella wrote:
On Tue, Jul 25, 2023 at 07:50:53AM -0400, Michael S. Tsirkin wrote:
quoted
On Fri, Jul 21, 2023 at 08:09:03AM +0300, Arseniy Krasnov wrote:
quoted

On 21.07.2023 00:42, Arseniy Krasnov wrote:
quoted
This adds handling of MSG_ZEROCOPY flag on transmission path: if this
flag is set and zerocopy transmission is possible (enabled in socket
options and transport allows zerocopy), then non-linear skb will be
created and filled with the pages of user's buffer. Pages of user's
buffer are locked in memory by 'get_user_pages()'. Second thing that
this patch does is replace type of skb owning: instead of calling
'skb_set_owner_sk_safe()' it calls 'skb_set_owner_w()'. Reason of this
change is that '__zerocopy_sg_from_iter()' increments 'sk_wmem_alloc'
of socket, so to decrease this field correctly proper skb destructor is
needed: 'sock_wfree()'. This destructor is set by 'skb_set_owner_w()'.

Signed-off-by: Arseniy Krasnov <redacted>
---
 Changelog:
 v5(big patchset) -> v1:
  * Refactorings of 'if' conditions.
  * Remove extra blank line.
  * Remove 'frag_off' field unneeded init.
  * Add function 'virtio_transport_fill_skb()' which fills both linear
    and non-linear skb with provided data.
 v1 -> v2:
  * Use original order of last four arguments in 'virtio_transport_alloc_skb()'.
 v2 -> v3:
  * Add new transport callback: 'msgzerocopy_check_iov'. It checks that
    provided 'iov_iter' with data could be sent in a zerocopy mode.
    If this callback is not set in transport - transport allows to send
    any 'iov_iter' in zerocopy mode. Otherwise - if callback returns 'true'
    then zerocopy is allowed. Reason of this callback is that in case of
    G2H transmission we insert whole skb to the tx virtio queue and such
    skb must fit to the size of the virtio queue to be sent in a single
    iteration (may be tx logic in 'virtio_transport.c' could be reworked
    as in vhost to support partial send of current skb). This callback
    will be enabled only for G2H path. For details pls see comment
    'Check that tx queue...' below.

 include/net/af_vsock.h                  |   3 +
 net/vmw_vsock/virtio_transport.c        |  39 ++++
 net/vmw_vsock/virtio_transport_common.c | 257 ++++++++++++++++++------
 3 files changed, 241 insertions(+), 58 deletions(-)
diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
index 0e7504a42925..a6b346eeeb8e 100644
--- a/include/net/af_vsock.h
+++ b/include/net/af_vsock.h
@@ -177,6 +177,9 @@ struct vsock_transport {

 	/* Read a single skb */
 	int (*read_skb)(struct vsock_sock *, skb_read_actor_t);
+
+	/* Zero-copy. */
+	bool (*msgzerocopy_check_iov)(const struct iov_iter *);
 };

 /**** CORE ****/
diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
index 7bbcc8093e51..23cb8ed638c4 100644
--- a/net/vmw_vsock/virtio_transport.c
+++ b/net/vmw_vsock/virtio_transport.c
@@ -442,6 +442,43 @@ static void virtio_vsock_rx_done(struct virtqueue *vq)
 	queue_work(virtio_vsock_workqueue, &vsock->rx_work);
 }

+static bool virtio_transport_msgzerocopy_check_iov(const struct iov_iter *iov)
+{
+	struct virtio_vsock *vsock;
+	bool res = false;
+
+	rcu_read_lock();
+
+	vsock = rcu_dereference(the_virtio_vsock);
+	if (vsock) {
+		struct virtqueue *vq;
+		int iov_pages;
+
+		vq = vsock->vqs[VSOCK_VQ_TX];
+
+		iov_pages = round_up(iov->count, PAGE_SIZE) / PAGE_SIZE;
+
+		/* Check that tx queue is large enough to keep whole
+		 * data to send. This is needed, because when there is
+		 * not enough free space in the queue, current skb to
+		 * send will be reinserted to the head of tx list of
+		 * the socket to retry transmission later, so if skb
+		 * is bigger than whole queue, it will be reinserted
+		 * again and again, thus blocking other skbs to be sent.
+		 * Each page of the user provided buffer will be added
+		 * as a single buffer to the tx virtqueue, so compare
+		 * number of pages against maximum capacity of the queue.
+		 * +1 means buffer for the packet header.
+		 */
+		if (iov_pages + 1 <= vq->num_max)
I think this check is actual only for case one we don't have indirect buffer feature.
With indirect mode whole data to send will be packed into one indirect buffer.

Thanks, Arseniy
Actually the reverse. With indirect you are limited to num_max.
Without you are limited to whatever space is left in the
queue (which you did not check here, so you should).

quoted
quoted
+			res = true;
+	}
+
+	rcu_read_unlock();
Just curious:
is the point of all this RCU dance to allow vsock
to change from under us? then why is it ok to
have it change? the virtio_transport_msgzerocopy_check_iov
will then refer to the old vsock ...
IIRC we introduced the RCU to handle hot-unplug issues:
commit 0deab087b16a ("vsock/virtio: use RCU to avoid use-after-free on
the_virtio_vsock")

When we remove the device, we flush all the works, etc. so we should
not be in this case (referring the old vsock), except for an irrelevant
transient as the device is disappearing.

Stefano
what if old device goes away then new one appears?

-- 
MST
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help