Thread (6 messages) 6 messages, 2 authors, 2013-06-03

Re: BQL-related tg3 transmit timeout on 5720 / Dell R720

From: Nithin Nayak Sujir <hidden>
Date: 2013-05-30 14:38:59


On 5/30/2013 2:05 AM, Roland Dreier wrote:
On Wed, May 22, 2013 at 3:02 PM, Roland Dreier [off-list ref] wrote:
quoted
I'll try to find a kernel where tg3 works on this system so I can bisect.
So I finally was able to successfully bisect our problem with tg3
transmit timeouts with recent kernels.  Recall this was on on _some_
of our Dell R720 systems with 4X tg3 ethernet with devices like:

     tg3 0000:02:00.0: eth0: Tigon3 [partno(BCM95720) rev 5720000] (PCI
Express) MAC address 90:b1:1c:3f:46:b8
     tg3 0000:02:00.0: eth0: attached PHY is 5720C (10/100/1000Base-T
Ethernet) (WireSpeed[1], EEE[1])

The bisection came down to

     commit 298376d3e8f00147548c426959ce79efc47b669a
     Author: Tom Herbert [off-list ref]
     Date:   Mon Nov 28 08:33:30 2011

         tg3: Support for byte queue limits

         Changes to tg3 to use byte queue limits.
[...]
and each send completes in turn.

For now I can work around the issue by hacking BQL out of tg3 in our
kernel, but I guess it would be good to understand this tg3-specific
issue of sends not completing and handle that in the tg3 driver.
Thanks for the bisect and detailed analysis. I will investigate this 
further.
I have a system that reproduces this very reliably, so let me know if
there is any further logging or other info that would help understand
this further.
Is the 5720 a NIC or a LOM? If it's a NIC would it be possible to try it 
on a different system to see if the behaviour depends on the system at all?
Thanks,
   Roland
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help