Re: [RFC PATCH v1] net: ethernet: nb8800: Reset HW block in ndo_open

From: Mason <hidden>
Date: 2017-07-31 14:08:31
Also in: linux-arm-kernel

Possibly related (same subject, not in this thread)

2017-07-31 · [RFC PATCH v1] net: ethernet: nb8800: Reset HW block in ndo_open · Måns Rullgård <hidden>
2017-07-31 · [RFC PATCH v1] net: ethernet: nb8800: Reset HW block in ndo_open · Måns Rullgård <hidden>
2017-07-31 · [RFC PATCH v1] net: ethernet: nb8800: Reset HW block in ndo_open · Mason <hidden>
2017-07-29 · [RFC PATCH v1] net: ethernet: nb8800: Reset HW block in ndo_open · Mason <hidden>
2017-07-29 · [RFC PATCH v1] net: ethernet: nb8800: Reset HW block in ndo_open · f.fainelli@gmail.com (Florian Fainelli)

On 31/07/2017 13:59, Måns Rullgård wrote:

Mason writes:

quoted

On 29/07/2017 17:18, Florian Fainelli wrote:

quoted

On 07/29/2017 05:02 AM, Mason wrote:

quoted

I have identified a 100% reproducible flaw.
I have proposed a work-around that brings this down to 0
(tested 1000 cycles of link up / ping / link down).

Can you also try to get help from your HW resources to eventually help
you find out what is going on here?

The patch I proposed /is/ based on the feedback from the HW team :-(
"Just reset the HW block, and everything will work as expected."

Nobody is saying a reset won't recover the lockup.  The problem is that
we don't know what caused it to lock up in the first place.  How do we
know it can't happen during normal operation?  If we knew the cause, it
might also be possible to avoid the situation entirely.

How does one prove that something "can't happen during normal operation"?

The "put adapter in loop-back mode so we can send ourselves fake packets"
shenanigans seems completely insane, if you ask me.

Other things make no sense to me, for example in nb8800_dma_stop()
there is a polling loop:

	do {
		mdelay(100);
		nb8800_writel(priv, NB8800_TX_DESC_ADDR, txb->dma_desc);
		wmb();
		mdelay(100);
		nb8800_writel(priv, NB8800_TXC_CR, txcr | TCR_EN);

		mdelay(5500);

		err = readl_poll_timeout_atomic(priv->base + NB8800_RXC_CR,
						rxcr, !(rxcr & RCR_EN),
						1000, 100000);
		printk("err=%d retry=%d\n", err, retry);
	} while (err && --retry);


(It was me who added the delays.)

*Whatever* delays I insert, it always goes 3 times through the loop.

[   29.654492] ++ETH++ gw32 reg=f002610c val=9ecc8000
[   29.759320] ++ETH++ gw32 reg=f0026100 val=005c0aff
[   35.364705] err=-110 retry=5
[   35.467609] ++ETH++ gw32 reg=f002610c val=9ecc8000
[   35.572436] ++ETH++ gw32 reg=f0026100 val=005c0aff
[   41.177822] err=-110 retry=4
[   41.280726] ++ETH++ gw32 reg=f002610c val=9ecc8000
[   41.385553] ++ETH++ gw32 reg=f0026100 val=005c0aff
[   46.890907] err=0 retry=3

How is that possible?

I've tried using spinlocks and delays to get parallel execution
down to a minimum, and have the same logs on both boards.

Regards.

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help