Thread (9 messages) 9 messages, 3 authors, 2008-02-12

Re: [ROSE] [AX25] possible circular locking

From: Pidoux <hidden>
Date: 2007-12-28 21:30:32

Jarek Poplawski wrote :
quoted hunk ↗ jump to hunk
On Mon, Dec 17, 2007 at 11:06:04AM +0100, Bernard Pidoux F6BVP wrote:
  
quoted
Hi,


When I killall kissattach I can see the following message.

This happens on kernel 2.6.24-rc5 already patched with the 6 previously
patches I sent recently.


=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.23.9 #1
-------------------------------------------------------
kissattach/2906 is trying to acquire lock:
 (linkfail_lock){-+..}, at: [<d8bd4603>] ax25_link_failed+0x11/0x39 [ax25]

but task is already holding lock:
 (ax25_list_lock){-+..}, at: [<d8bd7c7c>] ax25_device_event+0x38/0x84
[ax25]

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:
    
...

It seems, lockdep is warried about the different order here:

#1 (rose_neigh_list_lock){-+..}:
#3 (ax25_list_lock){-+..}:

#0 (linkfail_lock){-+..}:
#1 (rose_neigh_list_lock){-+..}:

#3 (ax25_list_lock){-+..}:
#0 (linkfail_lock){-+..}:

So, ax25_list_lock could be taken before and after linkfail_lock. 
I don't know if this three-thread clutch is very probable (or
possible at all), but it seems this other bug nearby reported by
Bernard ("[...] system impossible to reboot with linux-2.6.24-rc5")
could have similar source - namely ax25_list_lock held by
ax25_kill_by_device() during ax25_disconnect(). It looks like the
only place which calls ax25_disconnect() this way, so I guess, it
isn't necessary. But, since I don't know AX25 & ROSE at all, this
should be necessarily verified by somebody who knows these things.

I attach here my very experimental proposal with breaking the lock
for ax25_disconnect(), with some failsafe and debugging because of
this, but, if in this special case the lock is required for some
other reasons, then this patch should be dumped, of course.

Regards,
Jarek P.

WARNING:
not tested, not even compiled, needs some ack before testing!

---

diff -Nurp linux-2.6.24-rc5-/net/ax25/af_ax25.c linux-2.6.24-rc5+/net/ax25/af_ax25.c
--- linux-2.6.24-rc5-/net/ax25/af_ax25.c	2007-12-17 13:29:19.000000000 +0100
+++ linux-2.6.24-rc5+/net/ax25/af_ax25.c	2007-12-18 13:36:05.000000000 +0100
@@ -87,10 +87,19 @@ static void ax25_kill_by_device(struct n
 		return;
 
 	spin_lock_bh(&ax25_list_lock);
+again:
 	ax25_for_each(s, node, &ax25_list) {
 		if (s->ax25_dev == ax25_dev) {
+			struct hlist_node *nn = node->next;
+
 			s->ax25_dev = NULL;
+			spin_unlock_bh(&ax25_list_lock);
 			ax25_disconnect(s, ENETUNREACH);
+			spin_lock_bh(&ax25_list_lock);
+			if (nn != node->next) {
+				WARN_ON_ONCE(1);
+				goto again;
+			}
 		}
 	}
 	spin_unlock_bh(&ax25_list_lock);


  
After a few days of observation and a number of reboot for test purpose, 
I confirm that your patch is doing very well.
I have no more problems rebooting and the AX25 applications are running 
fine.

I hope this patch, with or without warning, could be applied in next 
kernel release.

Thanks again Jarek.

Regards from Bernard P.
f6bvp
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help