Re: [RFC 00/12] io_uring zerocopy send
From: Pavel Begunkov <asml.silence@gmail.com>
Date: 2021-12-01 19:11:57
Also in:
lkml
On 12/1/21 17:57, David Ahern wrote:
On 12/1/21 8:32 AM, Pavel Begunkov wrote:quoted
Sure. First, for dummy I set mtu by hand, not sure can do it from the userspace, can I? Without it __ip_append_data() falls into non-zerocopy path.
[...]
quoted
modprobe dummy numdummies=1 ip link set dummy0 upNo change is needed to the dummy driver: ip li add dummy0 type dummy ip li set dummy0 up mtu 65536
awesome, thanks!
quoted
# force requests to <dummy_ip_addr> go through the dummy device ip route add <dummy_ip_addr> dev dummy0that command is not necessary.quoted
With dummy I was just sinking the traffic to the dummy device, was good enough for me. Omitting "taskset" and "nice": send-zc -4 -D <dummy_ip_addr> -t 10 udp Similarly with msg_zerocopy: <kernel>/tools/testing/selftests/net/msg_zerocopy -4 -p 6666 -D <dummy_ip_addr> -t 10 -z udpI get -ENOBUFS with '-z' and any local address.
Ah, right. Citing from Willem's MSG_ZEROCOPY letter: " Notification skbuffs are allocated from optmem. For sockets that cannot effectively coalesce notifications, the optmem max may need to be increased to avoid hitting -ENOBUFS: sysctl -w net.core.optmem_max=1048576 "
quoted
For loopback testing, as zerocopy is not allowed for it as Willem explained in the original MSG_ZEROCOPY cover-letter, I used a hack to bypass it:diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index ebb12a7d386d..42df33b175ce 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h@@ -2854,9 +2854,7 @@ static inline int skb_orphan_frags(struct sk_buff*skb, gfp_t gfp_mask) /* Frags must be orphaned, even if refcounted, if skb might loop to rx path */ static inline int skb_orphan_frags_rx(struct sk_buff *skb, gfp_t gfp_mask) { - if (likely(!skb_zcopy(skb))) - return 0; - return skb_copy_ubufs(skb, gfp_mask); + return skb_orphan_frags(skb, gfp_mask); }that is the key change that is missing in your repo. All local traffic (traffic to the address on a dummy device falls into this comment) goes through loopback. That's just the way Linux works. If you look at the dummy driver, it's xmit function just drops packets if any actually make it there.
Not at all, the measurements were done without this patch. In case it may shed some light, attaching a fresh flamegraph, same 115761.6 MB/s btw, why a dummy device would ever go through loopback? It doesn't seem to make sense, though may be missing something.
quoted
quoted
mileage varies quite a bit.Interesting, any brief notes on the setup and the results? DummyVM on Chromebook. I just cloned your repos, built, install and test. As mentioned above, the skb_orphan_frags_rx change is missing from your repo and that is the key to your reported performance gains.
-- Pavel Begunkov
Attachments
- perf.svg [image/svg+xml] 118548 bytes