Thread (13 messages) 13 messages, 4 authors, 2009-09-30

Re: [PATCH] net: Fix sock_wfree() race

From: Eric Dumazet <hidden>
Date: 2009-09-23 13:44:17
Also in: lkml
Subsystem: networking [general], networking [sockets], the rest · Maintainers: "David S. Miller", Eric Dumazet, Jakub Kicinski, Paolo Abeni, Kuniyuki Iwashima, Willem de Bruijn, Linus Torvalds

David Miller a écrit :
From: David Miller <davem@davemloft.net>
Date: Fri, 11 Sep 2009 11:43:37 -0700 (PDT)
quoted
From: Eric Dumazet <redacted>
Date: Wed, 09 Sep 2009 00:49:31 +0200
quoted
[PATCH] net: Fix sock_wfree() race

Commit 2b85a34e911bf483c27cfdd124aeb1605145dc80
(net: No more expensive sock_hold()/sock_put() on each tx)
opens a window in sock_wfree() where another cpu
might free the socket we are working on.

Fix is to call sk->sk_write_space(sk) only
while still holding a reference on sk.

Since doing this call is done before the 
atomic_sub(truesize, &sk->sk_wmem_alloc), we should pass truesize as 
a bias for possible sk_wmem_alloc evaluations.

Reported-by: Jike Song <redacted>
Signed-off-by: Eric Dumazet <redacted>
Applied to net-next-2.6, thanks.  I'll queue up your simpler
version for -stable.
Eric, I have to revert, as you didn't update the callbacks
of several protocols such as SCTP and RDS in this change.

Let me know when you have a fixed version of this patch :-)
Sorry for the delay David. But this is complex. I am not
sure we can do a clean and safe thing, not counting
the added bloat.

If we do :

void sock_wfree(struct sk_buff *skb)
{
        struct sock *sk = skb->sk;
        int res;

        if (!sock_flag(sk, SOCK_USE_WRITE_QUEUE))
                sk->sk_write_space(sk, skb->truesize);

        res = atomic_sub_return(skb->truesize, &sk->sk_wmem_alloc);
        /*
         * if sk_wmem_alloc reached 0, we are last user and should
         * free this sock, as sk_free() call could not do it.
         */
        if (res == 0)
                __sk_free(sk);
}


There is still a possibility multiple cpus call sock_wfree()
for the same socket, and that they all call sk_write_space()
with their bias, yet the protocol still has a possible too
big estimation of sk_wmem_alloc

We could miss to wakeup a blocked writer in case low sk->sk_sndbuf
values are setup. (One could argue that with small sk_sndbuf
values we should not have many packets in flight : Keep in mind
sk_sndbuf can be lowered by the user)


With second patch we instead have :

void sock_wfree(struct sk_buff *skb)
{
	struct sock *sk = skb->sk;
	unsigned int len = skb->truesize;

	if (!sock_flag(sk, SOCK_USE_WRITE_QUEUE)) {
		/*
		 * Keep a reference on sk_wmem_alloc, this will be released
		 * after sk_write_space() call
		 */
		atomic_sub(len - 1, &sk->sk_wmem_alloc);
		sk->sk_write_space(sk);
		len = 1;
	}
	/*
	 * if sk_wmem_alloc reaches 0, we must finish what sk_free()
	 * could not do because of in-flight packets
	 */
	if (atomic_sub_return(len, &sk->sk_wmem_alloc) == 0)
		__sk_free(sk);
}

The accumulated transient error on sk_wmem_alloc is then < num_online_cpus(),
that should be OK even for very small sk_sndbuf values.

Of course TCP doesnt have to pay the price of sk_write_space() and the second
atomic operation re-added by this fix.

Here is the patch for reference :

[PATCH] net: Fix sock_wfree() race

Commit 2b85a34e911bf483c27cfdd124aeb1605145dc80
(net: No more expensive sock_hold()/sock_put() on each tx)
opens a window in sock_wfree() where another cpu
might free the socket we are working on.

A fix is to call sk->sk_write_space(sk) while still
holding a reference on sk.


Reported-by: Jike Song <redacted>
Signed-off-by: Eric Dumazet <redacted>
---
 net/core/sock.c |   19 ++++++++++++-------
 1 files changed, 12 insertions(+), 7 deletions(-)
diff --git a/net/core/sock.c b/net/core/sock.c
index 30d5446..e1f034e 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1228,17 +1228,22 @@ void __init sk_init(void)
 void sock_wfree(struct sk_buff *skb)
 {
 	struct sock *sk = skb->sk;
-	int res;
+	unsigned int len = skb->truesize;
 
-	/* In case it might be waiting for more memory. */
-	res = atomic_sub_return(skb->truesize, &sk->sk_wmem_alloc);
-	if (!sock_flag(sk, SOCK_USE_WRITE_QUEUE))
+	if (!sock_flag(sk, SOCK_USE_WRITE_QUEUE)) {
+		/*
+		 * Keep a reference on sk_wmem_alloc, this will be released
+		 * after sk_write_space() call
+		 */
+		atomic_sub(len - 1, &sk->sk_wmem_alloc);
 		sk->sk_write_space(sk);
+		len = 1;
+	}
 	/*
-	 * if sk_wmem_alloc reached 0, we are last user and should
-	 * free this sock, as sk_free() call could not do it.
+	 * if sk_wmem_alloc reaches 0, we must finish what sk_free()
+	 * could not do because of in-flight packets
 	 */
-	if (res == 0)
+	if (atomic_sub_return(len, &sk->sk_wmem_alloc) == 0)
 		__sk_free(sk);
 }
 EXPORT_SYMBOL(sock_wfree);
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help