Re: UDP implementation and the MSG_MORE flag
From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Date: 2021-01-27 06:02:05
On Tue, Jan 26, 2021 at 5:00 PM Willem de Bruijn [off-list ref] wrote:
On Tue, Jan 26, 2021 at 4:54 PM Willem de Bruijn [off-list ref] wrote:quoted
On Tue, Jan 26, 2021 at 9:58 AM Oliver Graute [off-list ref] wrote:quoted
Hello, we observe some unexpected behavior in the UDP implementation of the linux kernel. Some UDP packets send via the loopback interface are dropped in the kernel on the receive side when using sendto with the MSG_MORE flag. Every drop increases the InCsumErrors in /proc/self/net/snmp. Some example code to reproduce it is appended below. In the code we tracked it down to this code section. ( Even a little further but its unclear to me wy the csum() is wrong in the bad case) udpv6_recvmsg() ... if (checksum_valid || udp_skb_csum_unnecessary(skb)) { if (udp_skb_is_linear(skb)) err = copy_linear_skb(skb, copied, off, &msg->msg_iter); else err = skb_copy_datagram_msg(skb, off, msg, copied); } else { err = skb_copy_and_csum_datagram_msg(skb, off, msg); if (err == -EINVAL) { goto csum_copy_err; } } ...Thanks for the report with a full reproducer. I don't have a full answer yet, but can reproduce this easily. The third program, without MSG_MORE, builds an skb with CHECKSUM_PARTIAL in __ip_append_data. When looped to the receive path that ip_summed means no additional validation is needed. As encoded in skb_csum_unnecessary. The first and second programs are essentially the same, bar for a slight difference in length. In both cases packet length is very short compared to the loopback device MTU. Because of MSG_MORE, these packets have CHECKSUM_NONE. On receive in __udp4_lib_rcv() udp4_csum_init() err = skb_checksum_init_zero_check() The second program validates and sets ip_summed = CHECKSUM_COMPLETE and csum_valid = 1. The first does not, though err == 0. This appears to succeed consistently for packets <= 68B of payload, fail consistently otherwise. It is not clear to me yet what causes this distinction.This is from " /* For small packets <= CHECKSUM_BREAK perform checksum complete directly * in checksum_init. */ #define CHECKSUM_BREAK 76 " So the small packet gets checksummed immediately in __skb_checksum_validate_complete, but the larger one does not. Question is why the copy_and_checksum you pointed to seems to fail checksum.
Manually calling __skb_checksum_complete(skb) in skb_copy_and_csum_datagram_msg succeeds, so it is the skb_copy_and_csum_datagram that returns an incorrect csum. Bisection shows that this is a regression in 5.0, between 65d69e2505bb datagram: introduce skb_copy_and_hash_datagram_iter helper (fail) d05f443554b3 iov_iter: introduce hash_and_copy_to_iter helper 950fcaecd5cc datagram: consolidate datagram copy to iter helpers cb002d074dab iov_iter: pass void csum pointer to csum_and_copy_to_iter (pass) That's a significant amount of code change. I'll take a closer look, but checkpointing state for now..