Thread (44 messages) 44 messages, 7 authors, 2014-12-16

Oops: 17 SMP ARM (v3.16-rc2)

From: Russell King - ARM Linux <hidden>
Date: 2014-08-06 09:50:28
Also in: lkml

On Tue, Aug 05, 2014 at 01:31:29PM +0000, Mattis Lorentzon wrote:
We have applied your V2 patch set of 30 patches on top of v3.16-rc2 and are
currently running some stability tests.

During our first test round we triggered a timeout which caused the fec driver
to become unresponsive for several minutes. The attached backtrace was
shown when the hardware was rebooted.
What is on the other end of the link?
------------[ cut here ]------------
WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:264 dev_watchdog+0x270/0x27c()
NETDEV WATCHDOG: eth0 (fec): transmit queue 0 timed out
...
fec 2188000.ethernet eth0: TX ring dump
Nr     SC     addr       len  SKB
  0    0x1c00 0x00000000   66   (null)
...
 83    0x1c00 0x00000000   66   (null)
 84  H 0x1c00 0x00000000   66   (null)
 85    0x9c00 0x2e205000   66 9e384f00
 86    0x1c00 0x2e204800   66 9e384d80
 87    0x1c00 0x2e204000   66 9e384180
...
376    0x1c00 0x2e252800   66 81cf6180
377    0x1c00 0x2e253000   66 81cf6240
378 S  0x1c00 0x00000000   66   (null)
So, the software would insert the next packet into slot 378.  However,
the slots from 85 to 377 have not been reaped, despite those in 86 to
377 allegedly having been sent.  This is because the entry in slot 85
shows that it has yet to be sent.

I've no idea what causes this; it looks like there's something screwed
with the hardware which causes the transmitter to skip an entry in the
ring under certain circumstances.  As I've never been able to reproduce
it here, I've not been able to investigate it.

What I would like to do is to stamp each packet in some way with an
identifier marking its ring position, and then monitor the network to
find out whether the packet at slot 85 was actually transmitted - that's
made slightly harder because packets may be dropped at the receiver
when operating in promisc mode.  This would then allow us to work out
some likely causes.

Note that after the transmit watchdog, the interface should recover and
start operating normally again - and that should not take "several
minutes."

-- 
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help