Re: BUG: soft lockup detected on CPU#0! (2.6.18.2 plus hacks)
From: Ben Greear <hidden>
Date: 2007-01-05 20:35:08
David Miller wrote:
From: Herbert Xu <herbert@gondor.apana.org.au> Date: Thu, 04 Jan 2007 17:26:27 +1100quoted
David Stevens [off-list ref] wrote:quoted
You're right, I don't know whether it'll fix the problem Ben saw or not, but it looks like the original code can do a receive before the in_device is fully initialized, and that, of course, is bad. If the device for ip_rcv() is not the same one we were initializing when the receive interrupted, then the patch should have no effect either way -- I don't think it'll hide other problems. If it's hard to reproduce (which I guess is true), then you're right, no soft lockup doesn't really tell us if it's fixed or not.Actually I missed your point that the multicast locks aren't even initialised at that point. So this does explain the soft lock-up and therefore your patch is clearly the correct solution.I agree too, therefore I've added David's patch as below. I'll push this to the -stable branches as well. This fix is correct even if it does not entirely clear up the soft lockup bug being discussed in this thread, but I think it will :-)
We were able to reproduce the problem twice on the un-patched 2.6.18.2 kernel in about 2 hours of our stress test yesterday. I applied this patch (well, the ipv4 part..the ipv6 won't apply to 2.6.18.2), and it has run the stress test clean for a total of about 8 hours. So, I do believe this was the problem we were hitting, and it seems fixed. Thanks! Ben -- Ben Greear [off-list ref] Candela Technologies Inc http://www.candelatech.com