Thread (69 messages) 69 messages, 5 authors, 2019-10-14

Re: [PATCH v4 1/5] vsock/virtio: limit the memory used per-socket

From: Stefan Hajnoczi <stefanha@redhat.com>
Date: 2019-10-14 08:17:30
Also in: kvm, lkml

On Fri, Oct 11, 2019 at 03:40:48PM +0200, Stefano Garzarella wrote:
On Sun, Sep 1, 2019 at 8:56 AM Michael S. Tsirkin [off-list ref] wrote:
quoted
On Fri, Aug 30, 2019 at 11:40:59AM +0200, Stefano Garzarella wrote:
quoted
On Mon, Jul 29, 2019 at 10:04:29AM -0400, Michael S. Tsirkin wrote:
quoted
On Wed, Jul 17, 2019 at 01:30:26PM +0200, Stefano Garzarella wrote:
quoted
Since virtio-vsock was introduced, the buffers filled by the host
and pushed to the guest using the vring, are directly queued in
a per-socket list. These buffers are preallocated by the guest
with a fixed size (4 KB).

The maximum amount of memory used by each socket should be
controlled by the credit mechanism.
The default credit available per-socket is 256 KB, but if we use
only 1 byte per packet, the guest can queue up to 262144 of 4 KB
buffers, using up to 1 GB of memory per-socket. In addition, the
guest will continue to fill the vring with new 4 KB free buffers
to avoid starvation of other sockets.

This patch mitigates this issue copying the payload of small
packets (< 128 bytes) into the buffer of last packet queued, in
order to avoid wasting memory.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
This is good enough for net-next, but for net I think we
should figure out how to address the issue completely.
Can we make the accounting precise? What happens to
performance if we do?
Since I'm back from holidays, I'm restarting this thread to figure out
how to address the issue completely.

I did a better analysis of the credit mechanism that we implemented in
virtio-vsock to get a clearer view and I'd share it with you:

    This issue affect only the "host->guest" path. In this case, when the
    host wants to send a packet to the guest, it uses a "free" buffer
    allocated by the guest (4KB).
    The "free" buffers available for the host are shared between all
    sockets, instead, the credit mechanism is per-socket, I think to
    avoid the starvation of others sockets.
    The guests re-fill the "free" queue when the available buffers are
    less than half.

    Each peer have these variables in the per-socket state:
       /* local vars */
       buf_alloc        /* max bytes usable by this socket
                           [exposed to the other peer] */
       fwd_cnt          /* increased when RX packet is consumed by the
                           user space [exposed to the other peer] */
       tx_cnt                 /* increased when TX packet is sent to the other peer */

       /* remote vars  */
       peer_buf_alloc   /* peer's buf_alloc */
       peer_fwd_cnt     /* peer's fwd_cnt */

    When a peer sends a packet, it increases the 'tx_cnt'; when the
    receiver consumes the packet (copy it to the user-space buffer), it
    increases the 'fwd_cnt'.
    Note: increments are made considering the payload length and not the
    buffer length.

    The value of 'buf_alloc' and 'fwd_cnt' are sent to the other peer in
    all packet headers or with an explicit CREDIT_UPDATE packet.

    The local 'buf_alloc' value can be modified by the user space using
    setsockopt() with optname=SO_VM_SOCKETS_BUFFER_SIZE.

    Before to send a packet, the peer checks the space available:
      credit_available = peer_buf_alloc - (tx_cnt - peer_fwd_cnt)
    and it will send up to credit_available bytes to the other peer.

Possible solutions considering Michael's advice:
1. Use the buffer length instead of the payload length when we increment
   the counters:
  - This approach will account precisely the memory used per socket.
  - This requires changes in both guest and host.
  - It is not compatible with old drivers, so a feature should be negotiated.
2. Decrease the advertised 'buf_alloc' taking count of bytes queued in
   the socket queue but not used. (e.g. 256 byte used on 4K available in
   the buffer)
  - pkt->hdr.buf_alloc = buf_alloc - bytes_not_used.
  - This should be compatible also with old drivers.

Maybe the second is less invasive, but will it be too tricky?
Any other advice or suggestions?

Thanks in advance,
Stefano
OK let me try to clarify.  The idea is this:

Let's say we queue a buffer of 4K, and we copy if len < 128 bytes.  This
means that in the worst case (128 byte packets), each byte of credit in
the socket uses up 4K/128 = 16 bytes of kernel memory. In fact we need
to also account for the virtio_vsock_pkt since I think it's kept around
until userspace consumes it.

Thus given X buf alloc allowed in the socket, we should publish X/16
credits to the other side. This will ensure the other side does not send
more than X/16 bytes for a given socket and thus we won't need to
allocate more than X bytes to hold the data.

We can play with the copy break value to tweak this.
Hi Michael,
sorry for the long silence, but I focused on multi-transport.

Before to implement your idea, I tried to do some calculations and
looking better to our credit mechanism:

  buf_alloc = 256 KB (default, tunable through setsockopt)
  sizeof(struct virtio_vsock_pkt) = 128

  - guest (we use preallocated 4 KB buffers to receive packets, copying
    small packet - < 128 -)
    worst_case = 129
    buf_size = 4 KB
    credit2mem = (buf_size + sizeof(struct virtio_vsock_pkt)) / worst_case = 32

    credit_published = buf_alloc / credit2mem = ~8 KB
    Space for just 2 full packet (4 KB)

  - host (we copy packets from the vring, allocating the space for the payload)
    worst_case = 1
    buf_size = 1
    credit2mem = (buf_size + sizeof(struct virtio_vsock_pkt)) / worst_case = 129

    credit_published = buf_alloc / credit2mem = ~2 KB
    Less than a full packet (guest now can send up to 64 KB with a single
    packet, so it will be limited to 2 KB)

Current memory consumption in the worst case if the RX queue is full:
  - guest
    mem = (buf_alloc / worst_case) *
          (buf_size + sizeof(struct virtio_vsock_pkt) = ~8MB

  - host
    mem = (buf_alloc / worst_case) *
          (buf_size + sizeof(struct virtio_vsock_pkt) = ~32MB

I think that the performance with big packets will be affected,
but I still have to try.

Another approach that I want to explore is to play with buf_alloc
published to the peer.

One thing that's not clear to me yet is the meaning of
SO_VM_SOCKETS_BUFFER_SIZE:
- max amount of memory used in the RX queue
- max amount of payload bytes in the RX queue (without overhead of
  struct virtio_vsock_pkt + preallocated buffer)

From the 'include/uapi/linux/vm_sockets.h':
    /* Option name for STREAM socket buffer size.  Use as the option name in
     * setsockopt(3) or getsockopt(3) to set or get an unsigned long long that
     * specifies the size of the buffer underlying a vSockets STREAM socket.
     * Value is clamped to the MIN and MAX.
     */

    #define SO_VM_SOCKETS_BUFFER_SIZE 0

Regardless, I think we need to limit memory consumption in some way.
I'll check the implementation of other transports, to understand better.
SO_VM_SOCKETS_BUFFER_SIZE might have been useful for VMCI-specific
applications, but we should use SO_RCVBUF and SO_SNDBUF for portable
applications in the future.  Those socket options also work with other
address families.

I guess these sockopts are bypassed by AF_VSOCK because it doesn't use
the common skb queuing code in net/core/sock.c :(.  But one day we might
migrate to it...

Stefan

Attachments

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help