Thread (64 messages) 64 messages, 7 authors, 2007-01-12

Re: BUG: soft lockup detected on CPU#0! (2.6.18.2 plus hacks)

From: Ben Greear <hidden>
Date: 2007-01-04 01:03:17

Herbert Xu wrote:
David Stevens [off-list ref] wrote:
quoted
Ben,
       Here's a patch that I think will fix it, assuming the receive is 
on the
same device as the initialization. Can you try this out?
Hi David:

Your patch makes sense on its own but I don't see the direct connection
to the soft lock-up.  Sure it prevents the code path in question from
triggering.  However, if we don't understand why it's locking up in the
first place then this may just be hiding it rather than fixing it.

In particular, a soft lockup means that we're doing so much work in
the softirq handlers that processes are not getting run.  So what is
it exactly here that's causing us to get stuck in the softirq handlers?
Is it because we're somehow getting stuck in a net rx loop?
I'm not sure if it helps..but I did notice that 'ip' was using 99% of the
CPU on the system.  Could this be because it was spinning trying to acquire
the read-lock?  When I ran 'ifconfig -a', that process hung, and at that point
the system was rebooted.  Before I ran ifconfig, 'top' and 'ls' and similar
apps were responding fine, and I was logged in over ssh from the US to Australia, so
it's basic networking was functioning.

What if the race is that the read-lock is only half initialized, so that
it doesn't trigger the uninitialized-lock-use debug message, but still screws
up and will not ever let the reader acquire the lock?

Thanks,
Ben
Cheers,

-- 
Ben Greear [off-list ref]
Candela Technologies Inc  http://www.candelatech.com
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help