Thread (20 messages) 20 messages, 7 authors, 2004-12-28

Re: Lockup with 2.6.9-ac15 related to netconsole

From: Matt Mackall <hidden>
Date: 2004-12-21 00:56:14
Also in: lkml

On Tue, Dec 21, 2004 at 01:22:18AM +0100, Francois Romieu wrote:
Matt Mackall [off-list ref] :
quoted
On Mon, Dec 20, 2004 at 09:42:08AM -0000, Mark Broadbent wrote:
quoted
Exactly the same happens, I still get a 'NMI Watchdog detected LOCKUP'
with the r8169 device using the above patch on top of 2.6.10-rc3-bk10.
Ok, that suggests a problem localized to netpoll itself. Do you have
spinlock debugging turned on by any chance? 
Any chance of:
1 dev_queue_xmit
2 dev->xmit_lock taken
3 interruption
4 printk
5 netconsole write
6 dev->xmit_lock again
7 lockup

?

This is probably the silly question of the day.
Maybe, but the answer isn't obvious to me at the moment as I haven't
been thinking about such stuff enough lately. Silly response of the
day:

Mark, can you try this (again completely untested, but at least
compiles) patch? I'm afraid I don't have a proper test rig to
reproduce this at the moment. This will attempt to grab the lock, and
if it fails, will check for recursion. Then it will try to print a
message on the local console, temporarily disabling netconsole to
allow the printk to get through..

Index: l/net/core/netpoll.c
===================================================================
--- l.orig/net/core/netpoll.c	2004-11-04 10:53:23.388610000 -0800
+++ l/net/core/netpoll.c	2004-12-20 16:45:40.212709000 -0800
@@ -31,6 +31,8 @@
 #define MAX_SKBS 32
 #define MAX_UDP_CHUNK 1460
 
+static int netpoll_kill;
+
 static spinlock_t skb_list_lock = SPIN_LOCK_UNLOCKED;
 static int nr_skbs;
 static struct sk_buff *skbs;
@@ -183,13 +185,24 @@
 	int status;
 
 repeat:
-	if(!np || !np->dev || !netif_running(np->dev)) {
+	if(!np || !np->dev || !netif_running(np->dev) || netpoll_kill) {
 		__kfree_skb(skb);
 		return;
 	}
 
-	spin_lock(&np->dev->xmit_lock);
-	np->dev->xmit_lock_owner = smp_processor_id();
+	if(spin_trylock(&np->dev->xmit_lock))
+		np->dev->xmit_lock_owner = smp_processor_id();
+	else {
+		if(np->dev->xmit_lock_owner == smp_processor_id()) {
+			netpoll_kill = 1;
+			__kfree_skb(skb);
+			printk("Tried to recursively get dev->xmit_lock");
+			netpoll_kill = 0;
+			return;
+		}
+		spin_lock(&np->dev->xmit_lock);
+
+	}
 
 	/*
 	 * network drivers do not expect to be called if the queue is

-- 
Mathematics is the supreme nostalgia of our time.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help