Re: [PATCH v2] vsock: replace virtio_vsock_pkt with sk_buff

From: Cong Wang <hidden>
Date: 2022-10-15 19:50:10
Also in: kvm, lkml, virtualization

On Mon, Oct 03, 2022 at 12:11:39AM +0000, Bobby Eshleman wrote:

On Thu, Oct 06, 2022 at 09:34:10AM +0200, Stefano Garzarella wrote:

quoted

On Thu, Oct 06, 2022 at 03:08:12AM -0400, Michael S. Tsirkin wrote:

quoted

On Wed, Oct 05, 2022 at 06:19:44PM -0700, Bobby Eshleman wrote:

quoted

This patch replaces the struct virtio_vsock_pkt with struct sk_buff.

Using sk_buff in vsock benefits it by a) allowing vsock to be extended
for socket-related features like sockmap, b) vsock may in the future
use other sk_buff-dependent kernel capabilities, and c) vsock shares
commonality with other socket types.

This patch is taken from the original series found here:
https://lore.kernel.org/all/cover.1660362668.git.bobby.eshleman@bytedance.com/ (local)

Small-sized packet throughput improved by ~5% (from 18.53 Mb/s to 19.51
Mb/s). Tested using uperf, 16B payloads, 64 threads, 100s, averaged from
10 test runs (n=10). This improvement is likely due to packet merging.

Large-sized packet throughput decreases ~9% (from 27.25 Gb/s to 25.04
Gb/s). Tested using uperf, 64KB payloads, 64 threads, 100s, averaged
from 10 test runs (n=10).

Medium-sized packet throughput decreases ~5% (from 4.0 Gb/s to 3.81
Gb/s). Tested using uperf, 4k to 8k payload sizes picked randomly
according to normal distribution, 64 threads, 100s, averaged from 10
test runs (n=10).

It is surprizing to me that the original vsock code managed to outperform
the new one, given that to my knowledge we did not focus on optimizing it.

Yeah mee to.

Indeed.

quoted

From this numbers maybe the allocation cost has been reduced as it performs
better with small packets. But with medium to large packets we perform
worse, perhaps because previously we were allocating a contiguous buffer up
to 64k?
Instead alloc_skb() could allocate non-contiguous pages ? (which would solve
the problems we saw a few days ago)

I think this would be the case with alloc_skb_with_frags(), but
internally alloc_skb() uses kmalloc() for the payload and sk_buff_head
slab allocations for the sk_buff itself (all the more confusing to me,
as the prior allocator also uses two separate allocations per packet).

I think it is related to your implementation of
virtio_transport_add_to_queue(), where you introduced much more
complicated logic than before:

-	spin_lock_bh(&vsock->send_pkt_list_lock);
-	list_add_tail(&pkt->list, &vsock->send_pkt_list);
-	spin_unlock_bh(&vsock->send_pkt_list_lock);
-
+	virtio_transport_add_to_queue(&vsock->send_pkt_queue, skb);

A simple list_add_tail() is definitely faster than your
virtio_transport_skbs_can_merge() check. So, why do you have to merge
skb while we don't merge virtio_vsock_pkt?

_If_ you are trying to mimic TCP, I think you are doing it wrong, it can
be much more efficient if you could do the merge in sendmsg() before skb
is even allocated, see tcp_sendmsg_locked().

Thanks.

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help