Re: Lockup with 2.6.9-ac15 related to netconsole
From: Matt Mackall <hidden>
Date: 2004-12-21 00:56:14
Also in:
lkml
On Tue, Dec 21, 2004 at 01:22:18AM +0100, Francois Romieu wrote:
Matt Mackall [off-list ref] :quoted
On Mon, Dec 20, 2004 at 09:42:08AM -0000, Mark Broadbent wrote:quoted
Exactly the same happens, I still get a 'NMI Watchdog detected LOCKUP' with the r8169 device using the above patch on top of 2.6.10-rc3-bk10.Ok, that suggests a problem localized to netpoll itself. Do you have spinlock debugging turned on by any chance?Any chance of: 1 dev_queue_xmit 2 dev->xmit_lock taken 3 interruption 4 printk 5 netconsole write 6 dev->xmit_lock again 7 lockup ? This is probably the silly question of the day.
Maybe, but the answer isn't obvious to me at the moment as I haven't been thinking about such stuff enough lately. Silly response of the day: Mark, can you try this (again completely untested, but at least compiles) patch? I'm afraid I don't have a proper test rig to reproduce this at the moment. This will attempt to grab the lock, and if it fails, will check for recursion. Then it will try to print a message on the local console, temporarily disabling netconsole to allow the printk to get through.. Index: l/net/core/netpoll.c ===================================================================
--- l.orig/net/core/netpoll.c 2004-11-04 10:53:23.388610000 -0800
+++ l/net/core/netpoll.c 2004-12-20 16:45:40.212709000 -0800@@ -31,6 +31,8 @@ #define MAX_SKBS 32 #define MAX_UDP_CHUNK 1460 +static int netpoll_kill; + static spinlock_t skb_list_lock = SPIN_LOCK_UNLOCKED; static int nr_skbs; static struct sk_buff *skbs;
@@ -183,13 +185,24 @@ int status; repeat: - if(!np || !np->dev || !netif_running(np->dev)) { + if(!np || !np->dev || !netif_running(np->dev) || netpoll_kill) { __kfree_skb(skb); return; } - spin_lock(&np->dev->xmit_lock); - np->dev->xmit_lock_owner = smp_processor_id(); + if(spin_trylock(&np->dev->xmit_lock)) + np->dev->xmit_lock_owner = smp_processor_id(); + else { + if(np->dev->xmit_lock_owner == smp_processor_id()) { + netpoll_kill = 1; + __kfree_skb(skb); + printk("Tried to recursively get dev->xmit_lock"); + netpoll_kill = 0; + return; + } + spin_lock(&np->dev->xmit_lock); + + } /* * network drivers do not expect to be called if the queue is
--
Mathematics is the supreme nostalgia of our time.