Re: 2.6.24 BUG: soft lockup - CPU#X
From: Matheos Worku <hidden>
Date: 2008-03-28 00:20:58
David Miller wrote:
From: Matheos Worku <redacted> Date: Thu, 27 Mar 2008 16:45:06 -0700quoted
Brandeburg, Jesse wrote:quoted
Jarek Poplawski wrote:quoted
On Wed, Mar 26, 2008 at 01:26:00PM -0700, Matheos Worku wrote: ...quoted
nsn57-110 login: BUG: soft lockup - CPU#2 stuck for 11s! ... Call Trace: [<ffffffff803ef5f6>] __skb_clone+0x24/0xdc [<ffffffff803f152e>] skb_realloc_headroom+0x30/0x63 [<ffffffff882edd40>] :niu:niu_start_xmit+0x114/0x5af [<ffffffff80221995>] gart_map_single+0x0/0x70 [<ffffffff803f5e2b>] dev_hard_start_xmit+0x1d2/0x246 ...Maybe I'm wrong with this again, but I wonder about this gart_map_single on almost all traces, and probably not supposed to be seen here. Did you try with some memory re-config/debugging?I have some more examples of this but with the ixgbe driver. We are running heavy bidirectional stress with multiple rx (non-napi, yeah I know) interrupts by default (and userspace irqbalance is probably on, I'll have the lab try it without)I have seen the lockup on kernels 2.6.18 and newer mostly on TX traffic. I have seen it on another 10G driver (off the tree niu driver sibling, nxge). The nxge driver doesn't use any TX interrupts and I have seen it with UDP TX, irqbalance disabled, with no irq activity at all. some example traces included.Interesting. Are you running uperf in a way such that there are multiple processors doing TX's in parallel? That might be a clue.
Dave, Actually I am running a version of the nxge driver which uses only one TX ring, no LLTX enabled so the driver does single threaded TX. On the other hand, uperf (or iperf, netperf ) is running multiple TX connections in parallel and the connections are bound on multiple processors, hence they are running in parallel. Regards Matheos