Re: [PATCH net-next v2 1/2] udp: msg_zerocopy
From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Date: 2018-11-27 06:44:56
Subsystem:
networking [general], networking [ipv4/ipv6], the rest · Maintainers:
"David S. Miller", Eric Dumazet, Jakub Kicinski, Paolo Abeni, David Ahern, Ido Schimmel, Linus Torvalds
On Mon, Nov 26, 2018 at 1:19 PM Willem de Bruijn [off-list ref] wrote:
On Mon, Nov 26, 2018 at 1:04 PM Paolo Abeni [off-list ref] wrote:quoted
On Mon, 2018-11-26 at 12:59 -0500, Willem de Bruijn wrote:quoted
The callers of this function do flush the queue of the other skbs on error, but only after the call to sock_zerocopy_put_abort. sock_zerocopy_put_abort depends on total rollback to revert the sk_zckey increment and suppress the completion notification (which must not happen on return with error). I don't immediately have a fix. Need to think about this some more..[still out of sheer ignorance] How about tacking a refcnt for the whole ip_append_data() scope, like in the tcp case? that will add an atomic op per loop (likely, hitting the cache) but will remove some code hunk in sock_zerocopy_put_abort() and sock_zerocopy_alloc().The atomic op pair is indeed what I was trying to avoid. But I also need to solve the problem that the final decrement will happen from the freeing of the other skbs in __ip_flush_pending_frames, and will not suppress the notification. Freeing the entire queue inside __ip_append_data, effectively making it a true noop on error is one approach. But that is invasive, also to non zerocopy codepaths, so I would rather avoid that. Perhaps I need to handle the abort logic in udp_sendmsg directly, after both __ip_append_data and __ip_flush_pending_frames.
Actually, (1) the notification suppression is handled correctly, as .._abort decrements uarg->len. If now zero, this suppresses the notification in sock_zerocopy_callback, regardless whether that callback is called right away or from a later kfree_skb. (2) if moving skb_zcopy_set below getfrag, then no kfree_skb will be called on a zerocopy skb inside __ip_append_data. So on abort the refcount is exactly the number of zerocopy skbs on the queue that will call sock_zerocopy_put later. Abort then only needs to handle special case zero, and call sock_zerocopy_put right away. Tentative fix on top of v2 (I'll squash into v3): ---
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 2179ef84bb44..4b21a58329d1 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c@@ -1099,12 +1099,13 @@ void sock_zerocopy_put_abort(struct ubuf_info *uarg) - if (sk->sk_type != SOCK_STREAM && !refcount_read(&uarg->refcnt)) + if (sk->sk_type != SOCK_STREAM &&
!refcount_read(&uarg->refcnt)) {
refcount_set(&uarg->refcnt, 1);
-
- sock_zerocopy_put(uarg);
+ sock_zerocopy_put(uarg);
+ }
}
}
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 7504da2f33d6..a19396e21b35 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c@@ -1014,13 +1014,6 @@ static int __ip_append_data(struct sock *sk, skb->csum = 0; skb_reserve(skb, hh_len); - /* only the initial fragment is time stamped */ - skb_shinfo(skb)->tx_flags = cork->tx_flags; - cork->tx_flags = 0; - skb_shinfo(skb)->tskey = tskey; - tskey = 0; - skb_zcopy_set(skb, uarg); - /* * Find where to start putting bytes. */
@@ -1053,6 +1046,13 @@ static int __ip_append_data(struct sock *sk, exthdrlen = 0; csummode = CHECKSUM_NONE; + /* only the initial fragment is time stamped */ + skb_shinfo(skb)->tx_flags = cork->tx_flags; + cork->tx_flags = 0; + skb_shinfo(skb)->tskey = tskey; + tskey = 0; + skb_zcopy_set(skb, uarg); + if ((flags & MSG_CONFIRM) && !skb_prev) skb_set_dst_pending_confirm(skb, 1); ---
This patch moves all the skb_shinfo touches operations after the copy,
to avoid touching that twice.
Instead of the refcnt trick, I could also refactor sock_zerocopy_put
and call __sock_zerocopy_put
---
-void sock_zerocopy_put(struct ubuf_info *uarg)
+static void __sock_zerocopy_put(struct ubuf_info *uarg)
{
- if (uarg && refcount_dec_and_test(&uarg->refcnt)) {
- if (uarg->callback)
- uarg->callback(uarg, uarg->zerocopy);
- else
- consume_skb(skb_from_uarg(uarg));
- }
+ if (uarg->callback)
+ uarg->callback(uarg, uarg->zerocopy);
+ else
+ consume_skb(skb_from_uarg(uarg));
+}
+
+void sock_zerocopy_put(struct ubuf_info *uarg)
+ if (uarg && refcount_dec_and_test(&uarg->refcnt))
+ __sock_zerocopy_put(uarg);
}
EXPORT_SYMBOL_GPL(sock_zerocopy_put);
---