Re: [PATCH 3/6] vsock: add netdev to vhost/virtio vsock
From: Bobby Eshleman <hidden>
Date: 2022-09-07 01:44:53
Also in:
kvm, lkml
On Tue, Sep 06, 2022 at 06:58:32AM -0400, Michael S. Tsirkin wrote:
On Mon, Aug 15, 2022 at 10:56:06AM -0700, Bobby Eshleman wrote:quoted
In order to support usage of qdisc on vsock traffic, this commit introduces a struct net_device to vhost and virtio vsock. Two new devices are created, vhost-vsock for vhost and virtio-vsock for virtio. The devices are attached to the respective transports. To bypass the usage of the device, the user may "down" the associated network interface using common tools. For example, "ip link set dev virtio-vsock down" lets vsock bypass the net_device and qdisc entirely, simply using the FIFO logic of the prior implementation. For both hosts and guests, there is one device for all G2H vsock sockets and one device for all H2G vsock sockets. This makes sense for guests because the driver only supports a single vsock channel (one pair of TX/RX virtqueues), so one device and qdisc fits. For hosts, this may not seem ideal for some workloads. However, it is possible to use a multi-queue qdisc, where a given queue is responsible for a range of sockets. This seems to be a better solution than having one device per socket, which may yield a very large number of devices and qdiscs, all of which are dynamically being created and destroyed. Because of this dynamism, it would also require a complex policy management daemon, as devices would constantly be spun up and down as sockets were created and destroyed. To avoid this, one device and qdisc also applies to all H2G sockets. Signed-off-by: Bobby Eshleman <redacted>I've been thinking about this generally. vsock currently assumes reliability, but with qdisc can't we get packet drops e.g. depending on the queueing? What prevents user from configuring such a discipline? One thing people like about vsock is that it's very hard to break H2G communication even with misconfigured networking.
If qdisc decides to discard a packet, it returns NET_XMIT_CN via dev_queue_xmit(). This v1 allows this quietly, but v2 could return an error to the user (-ENOMEM or maybe -ENOBUFS) when this happens, similar to when vsock is unable to enqueue a packet currently. The user could still, for example, choose the noop qdisc. Assuming the v2 change mentioned above, their sendmsg() calls will return errors. Similar to how if they choose the wrong CID they will get an error when connecting a socket. Best, Bobby