Thread (3 messages) 3 messages, 2 authors, 2022-08-02

Re: [RFC PATCH v2 0/9] vsock: updates for SO_RCVLOWAT handling

From: Vishnu Dasa via Virtualization <hidden>
Date: 2022-08-02 05:35:31
Also in: kvm, linux-hyperv, lkml, netdev

On Jul 27, 2022, at 5:37 AM, Stefano Garzarella [off-list ref] wrote:

Hi Arseniy,

On Mon, Jul 25, 2022 at 07:54:05AM +0000, Arseniy Krasnov wrote:
quoted
Hello,

This patchset includes some updates for SO_RCVLOWAT:

1) af_vsock:
 During my experiments with zerocopy receive, i found, that in some
 cases, poll() implementation violates POSIX: when socket has non-
 default SO_RCVLOWAT(e.g. not 1), poll() will always set POLLIN and
 POLLRDNORM bits in 'revents' even number of bytes available to read
 on socket is smaller than SO_RCVLOWAT value. In this case,user sees
 POLLIN flag and then tries to read data(for example using  'read()'
 call), but read call will be blocked, because  SO_RCVLOWAT logic is
 supported in dequeue loop in af_vsock.c. But the same time,  POSIX
 requires that:

 "POLLIN     Data other than high-priority data may be read without
             blocking.
  POLLRDNORM Normal data may be read without blocking."

 See https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.open-std.org%2Fjtc1%2Fsc22%2Fopen%2Fn4217.pdf&amp;data=05%7C01%7Cvdasa%40vmware.com%7C5ad2c6759fd8439e938708da6fccbee4%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C637945222450930014%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=60hG3DiYufOCv1DuufSdujiLEKDNou1Ztyah3GPbOLI%3D&amp;reserved=0, page 293.

 So, we have, that poll() syscall returns POLLIN, but read call will
 be blocked.

 Also in man page socket(7) i found that:

 "Since Linux 2.6.28, select(2), poll(2), and epoll(7) indicate a
 socket as readable only if at least SO_RCVLOWAT bytes are available."

 I checked TCP callback for poll()(net/ipv4/tcp.c, tcp_poll()), it
 uses SO_RCVLOWAT value to set POLLIN bit, also i've tested TCP with
 this case for TCP socket, it works as POSIX required.

 I've added some fixes to af_vsock.c and virtio_transport_common.c,
 test is also implemented.

2) virtio/vsock:
 It adds some optimization to wake ups, when new data arrived. Now,
 SO_RCVLOWAT is considered before wake up sleepers who wait new data.
 There is no sense, to kick waiter, when number of available bytes
 in socket's queue < SO_RCVLOWAT, because if we wake up reader in
 this case, it will wait for SO_RCVLOWAT data anyway during dequeue,
 or in poll() case, POLLIN/POLLRDNORM bits won't be set, so such
 exit from poll() will be "spurious". This logic is also used in TCP
 sockets.
Nice, it looks good!
quoted
3) vmci/vsock:
 Same as 2), but i'm not sure about this changes. Will be very good,
 to get comments from someone who knows this code.
I CCed VMCI maintainers to the patch and also to this cover, maybe
better to keep them in the loop for next versions.

(Jorgen's and Rajesh's emails bounced back, so I'm CCing here only
Bryan, Vishnu, and pv-drivers@vmware.com)
Hi Stefano,
Jorgen and Rajesh are no longer with VMware.  There's a patch in
flight to remove Rajesh from the MAINTAINERS file (Jorgen is already
removed).

Thanks,
Vishnu
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help