Thread (36 messages) 36 messages, 8 authors, 2008-03-29

Re: 2.6.24 BUG: soft lockup - CPU#X

From: Matheos Worku <hidden>
Date: 2008-03-28 00:20:58

David Miller wrote:
From: Matheos Worku <redacted>
Date: Thu, 27 Mar 2008 16:45:06 -0700

  
quoted
Brandeburg, Jesse wrote:
    
quoted
Jarek Poplawski wrote:
  
      
quoted
On Wed, Mar 26, 2008 at 01:26:00PM -0700, Matheos Worku wrote:
...
    
        
quoted
nsn57-110 login: BUG: soft lockup - CPU#2 stuck for 11s! ... Call
Trace: [<ffffffff803ef5f6>] __skb_clone+0x24/0xdc
[<ffffffff803f152e>] skb_realloc_headroom+0x30/0x63
[<ffffffff882edd40>] :niu:niu_start_xmit+0x114/0x5af
[<ffffffff80221995>] gart_map_single+0x0/0x70
[<ffffffff803f5e2b>] dev_hard_start_xmit+0x1d2/0x246 ...
      
          
Maybe I'm wrong with this again, but I wonder about this
gart_map_single on almost all traces, and probably not supposed to be
seen here. Did you try with some memory re-config/debugging?
    
        
I have some more examples of this but with the ixgbe driver.  We are
running heavy bidirectional stress with multiple rx (non-napi, yeah I
know) interrupts by default (and userspace irqbalance is probably on,
I'll have the lab try it without)
  
      
I have seen the lockup on kernels 2.6.18 and newer mostly on TX traffic. 
I have seen it on another 10G driver (off the tree niu driver sibling, 
nxge).  The nxge driver doesn't use any TX interrupts and I have seen it 
with UDP TX, irqbalance disabled, with no irq activity at all.  some 
example traces included.
    
Interesting.

Are you running uperf in a way such that there are multiple
processors doing TX's in parallel?  That might be a clue.
  
Dave,
Actually I am running a version of the nxge driver which uses only one 
TX ring, no LLTX enabled so the driver does single threaded TX. On the 
other hand, uperf (or iperf, netperf ) is running multiple TX 
connections in parallel and the connections are bound on multiple 
processors, hence they are running in parallel.

Regards
Matheos
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help