[PATCH] tcp: fix inet_twsk_deschedule()
From: Eric Dumazet <hidden>
Date: 2011-02-19 08:36:07
Also in:
linux-mm, lkml
Subsystem:
networking [general], networking [tcp], the rest · Maintainers:
"David S. Miller", Eric Dumazet, Jakub Kicinski, Paolo Abeni, Neal Cardwell, Linus Torvalds
Le vendredi 18 février 2011 à 12:38 -0800, Eric W. Biederman a écrit :
Arnaldo Carvalho de Melo [off-list ref] writes:quoted
Em Fri, Feb 18, 2011 at 05:01:28PM -0200, Arnaldo Carvalho de Melo escreveu:quoted
Em Fri, Feb 18, 2011 at 10:48:18AM -0800, Linus Torvalds escreveu:quoted
This seems to be a fairly straightforward bug. In net/ipv4/inet_timewait_sock.c we have this: /* These are always called from BH context. See callers in * tcp_input.c to verify this. */ /* This is for handling early-kills of TIME_WAIT sockets. */ void inet_twsk_deschedule(struct inet_timewait_sock *tw, struct inet_timewait_death_row *twdr) { spin_lock(&twdr->death_lock); .. and the intention is clearly that that spin_lock is BH-safe because it's called from BH context. Except that clearly isn't true. It's called from a worker thread:quoted
stack backtrace: Pid: 10833, comm: kworker/u:1 Not tainted 2.6.38-rc4-359399.2010AroraKernelBeta.fc14.x86_64 #1 Call Trace: [<ffffffff81460e69>] ? inet_twsk_deschedule+0x29/0xa0 [<ffffffff81460fd6>] ? inet_twsk_purge+0xf6/0x180 [<ffffffff81460f10>] ? inet_twsk_purge+0x30/0x180 [<ffffffff814760fc>] ? tcp_sk_exit_batch+0x1c/0x20 [<ffffffff8141c1d3>] ? ops_exit_list.clone.0+0x53/0x60 [<ffffffff8141c520>] ? cleanup_net+0x100/0x1b0 [<ffffffff81068c47>] ? process_one_work+0x187/0x4b0 [<ffffffff81068be1>] ? process_one_work+0x121/0x4b0 [<ffffffff8141c420>] ? cleanup_net+0x0/0x1b0 [<ffffffff8106a65c>] ? worker_thread+0x15c/0x330so it can deadlock with a BH happening at the same time, afaik. The code (and comment) is all from 2005, it looks like the BH->worker thread has broken the code. But somebody who knows that code better should take a deeper look at it. Added acme to the cc, since the code is attributed to him back in 2005 ;). Although I don't know how active he's been in networking lately (seems to be all perf-related). Whatever, it can't hurt.Original code is ANK's, I just made it possible to use with DCCP, and yeah, the smiley is appropriate, something 6 years old and the world around it changing continually... well, thanks for the git blame ;-)But yeah, your analisys seems correct, with the bug being introduced by one of these world around it changing continually issues, networking namespaces broke the rules of the game on its cleanup_net() routine, adding Pavel to the CC list since it doesn't hurt ;-)Which probably gets the bug back around to me. I guess this must be one of those ipv4 cases that where the cleanup simply did not exist in the rmmod sense that we had to invent. I think that was Daniel who did the time wait sockets. I do remember they were a real pain. Would a bh_disable be sufficient? I guess I should stop remembering and look at the code now.
Here is the patch to fix the problem Daniel commit (d315492b1a6ba29d (netns : fix kernel panic in timewait socket destruction) was OK (it did use local_bh_disable()) Problem comes from commit 575f4cd5a5b6394577 (net: Use rcu lookups in inet_twsk_purge.) added in 2.6.33 Thanks ! [PATCH] tcp: fix inet_twsk_deschedule() Eric W. Biederman reported a lockdep splat in inet_twsk_deschedule() This is caused by inet_twsk_purge(), run from process context, and commit 575f4cd5a5b6394577 (net: Use rcu lookups in inet_twsk_purge.) removed the BH disabling that was necessary. Add the BH disabling but fine grained, right before calling inet_twsk_deschedule(), instead of whole function. With help from Linus Torvalds and Eric W. Biederman Reported-by: Eric W. Biederman <redacted> Signed-off-by: Eric Dumazet <redacted> CC: Daniel Lezcano <redacted> CC: Pavel Emelyanov <redacted> CC: Arnaldo Carvalho de Melo <redacted> CC: stable <stable@kernel.org> (# 2.6.33+) --- net/ipv4/inet_timewait_sock.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/net/ipv4/inet_timewait_sock.c b/net/ipv4/inet_timewait_sock.c
index c5af909..3c8dfa1 100644
--- a/net/ipv4/inet_timewait_sock.c
+++ b/net/ipv4/inet_timewait_sock.c@@ -505,7 +505,9 @@ restart: } rcu_read_unlock(); + local_bh_disable(); inet_twsk_deschedule(tw, twdr); + local_bh_enable(); inet_twsk_put(tw); goto restart_rcu; } --
To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>