Thread (41 messages) 41 messages, 7 authors, 2014-07-25

Re: [PATCH] ipv4: fix a race in ip4_datagram_release_cb()

From: Eric Dumazet <hidden>
Date: 2014-06-23 08:33:38
Subsystem: networking [general], networking [ipv4/ipv6], networking [sockets], the rest · Maintainers: "David S. Miller", Eric Dumazet, Jakub Kicinski, Paolo Abeni, David Ahern, Ido Schimmel, Kuniyuki Iwashima, Willem de Bruijn, Linus Torvalds

On Sun, 2014-06-22 at 12:07 -0700, dormando wrote:
Update on testing:

I only have two machines that crash on their own frequently (more like
one, even). Unfortunately something happened to the datacenter it's in and
it was offline for a week. The machine normally crashes after 1.5-4d,
averaging 2d.

It's done about three days total time without a new crash. I also have the
kernel running in another datacenter for ~10 days.. but it takes 30-150
days to crash in that one.

So, inconclusive, but still promising. If the machine survives the week it
probably means it's fixed, or at least greatly reduced.

I saw that one of your patches got queued for stable, but all three were
necessary to fix udpkill. What's your plan for cleanup/upstreaming?

Did you folks end up running udpkill under the tester thing?
I did not test udpkill, as the known problem is the DST_NOCACHE flag.

We end up calling sk_dst_set(sk, dst) with a dst having this flag set.

So maybe DST_NOCACHE should be renamed, if we _can _ cache a dst like
this. Its meaning is really that dst_release() has to track when
refcount reaches 0 so that last owner fress dst, but we need to respect
rcu grace period.

Fixing sk_dst_set() as I did is not enough, as it is only reducing race
window.

Something like following :

 include/net/sock.h   |    4 ++--
 net/core/dst.c       |   16 +++++++++++-----
 net/ipv4/ip_tunnel.c |    7 +------
 3 files changed, 14 insertions(+), 13 deletions(-)
diff --git a/include/net/sock.h b/include/net/sock.h
index 07b7fcd60d80..173cae485de1 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1730,8 +1730,8 @@ sk_dst_get(struct sock *sk)
 
 	rcu_read_lock();
 	dst = rcu_dereference(sk->sk_dst_cache);
-	if (dst)
-		dst_hold(dst);
+	if (dst && !atomic_inc_not_zero(&dst->__refcnt))
+		dst = NULL;
 	rcu_read_unlock();
 	return dst;
 }
diff --git a/net/core/dst.c b/net/core/dst.c
index 80d6286c8b62..a028409ee438 100644
--- a/net/core/dst.c
+++ b/net/core/dst.c
@@ -269,6 +269,15 @@ again:
 }
 EXPORT_SYMBOL(dst_destroy);
 
+static void dst_destroy_rcu(struct rcu_head *head)
+{
+	struct dst_entry *dst = container_of(head, struct dst_entry, rcu_head);
+
+	dst = dst_destroy(dst);
+	if (dst)
+		__dst_free(dst);
+}
+
 void dst_release(struct dst_entry *dst)
 {
 	if (dst) {
@@ -276,11 +285,8 @@ void dst_release(struct dst_entry *dst)
 
 		newrefcnt = atomic_dec_return(&dst->__refcnt);
 		WARN_ON(newrefcnt < 0);
-		if (unlikely(dst->flags & DST_NOCACHE) && !newrefcnt) {
-			dst = dst_destroy(dst);
-			if (dst)
-				__dst_free(dst);
-		}
+		if (unlikely(dst->flags & DST_NOCACHE) && !newrefcnt)
+			call_rcu(&dst->rcu_head, dst_destroy_rcu);
 	}
 }
 EXPORT_SYMBOL(dst_release);
diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
index 097b3e7c1e8f..cc3b7fd34555 100644
--- a/net/ipv4/ip_tunnel.c
+++ b/net/ipv4/ip_tunnel.c
@@ -73,12 +73,7 @@ static void __tunnel_dst_set(struct ip_tunnel_dst *idst,
 {
 	struct dst_entry *old_dst;
 
-	if (dst) {
-		if (dst->flags & DST_NOCACHE)
-			dst = NULL;
-		else
-			dst_clone(dst);
-	}
+	dst_clone(dst);
 	old_dst = xchg((__force struct dst_entry **)&idst->dst, dst);
 	dst_release(old_dst);
 }
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help