Thread (8 messages) 8 messages, 3 authors, 2021-02-03

Re: UDP implementation and the MSG_MORE flag

From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Date: 2021-01-27 06:02:05

On Tue, Jan 26, 2021 at 5:00 PM Willem de Bruijn
[off-list ref] wrote:
On Tue, Jan 26, 2021 at 4:54 PM Willem de Bruijn
[off-list ref] wrote:
quoted
On Tue, Jan 26, 2021 at 9:58 AM Oliver Graute [off-list ref] wrote:
quoted
Hello,

we observe some unexpected behavior in the UDP implementation of the
linux kernel.

Some UDP packets send via the loopback interface are dropped in the
kernel on the receive side when using sendto with the MSG_MORE flag.
Every drop increases the InCsumErrors in /proc/self/net/snmp. Some
example code to reproduce it is appended below.

In the code we tracked it down to this code section. ( Even a little
further but its unclear to me wy the csum() is wrong in the bad case)

udpv6_recvmsg()
...
if (checksum_valid || udp_skb_csum_unnecessary(skb)) {
                if (udp_skb_is_linear(skb))
                        err = copy_linear_skb(skb, copied, off, &msg->msg_iter);
                else
                        err = skb_copy_datagram_msg(skb, off, msg, copied);
        } else {
                err = skb_copy_and_csum_datagram_msg(skb, off, msg);
                if (err == -EINVAL) {
                        goto csum_copy_err;
                }
        }
...
Thanks for the report with a full reproducer.

I don't have a full answer yet, but can reproduce this easily.

The third program, without MSG_MORE, builds an skb with
CHECKSUM_PARTIAL in __ip_append_data. When looped to the receive path
that ip_summed means no additional validation is needed. As encoded in
skb_csum_unnecessary.

The first and second programs are essentially the same, bar for a
slight difference in length. In both cases packet length is very short
compared to the loopback device MTU. Because of MSG_MORE, these
packets have CHECKSUM_NONE.

On receive in

  __udp4_lib_rcv()
    udp4_csum_init()
      err = skb_checksum_init_zero_check()

The second program validates and sets ip_summed = CHECKSUM_COMPLETE
and csum_valid = 1.
The first does not, though err == 0.

This appears to succeed consistently for packets <= 68B of payload,
fail consistently otherwise. It is not clear to me yet what causes
this distinction.
This is from

"
/* For small packets <= CHECKSUM_BREAK perform checksum complete directly
 * in checksum_init.
 */
#define CHECKSUM_BREAK 76
"

So the small packet gets checksummed immediately in
__skb_checksum_validate_complete, but the larger one does not.

Question is why the copy_and_checksum you pointed to seems to fail checksum.
Manually calling __skb_checksum_complete(skb) in
skb_copy_and_csum_datagram_msg succeeds, so it is the
skb_copy_and_csum_datagram that returns an incorrect csum.

Bisection shows that this is a regression in 5.0, between

65d69e2505bb datagram: introduce skb_copy_and_hash_datagram_iter helper (fail)
d05f443554b3 iov_iter: introduce hash_and_copy_to_iter helper
950fcaecd5cc datagram: consolidate datagram copy to iter helpers
cb002d074dab iov_iter: pass void csum pointer to csum_and_copy_to_iter (pass)

That's a significant amount of code change. I'll take a closer look,
but checkpointing state for now..
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help