Thread (40 messages) 40 messages, 3 authors, 2020-01-11

Re: [drivers/net/phy/sfp] intermittent failure in state machine checks

From: Russell King - ARM Linux admin <linux@armlinux.org.uk>
Date: 2020-01-10 11:44:40

On Fri, Jan 10, 2020 at 09:50:00AM +0000, ѽ҉ᶬḳ℠ wrote:
On 10/01/2020 09:27, Russell King - ARM Linux admin wrote:
quoted
On Thu, Jan 09, 2020 at 11:50:14PM +0000, ѽ҉ᶬḳ℠ wrote:
quoted
On 09/01/2020 23:10, Russell King - ARM Linux admin wrote:
quoted
Please don't use mii-tool with SFPs that do not have a PHY; the "PHY"
registers are emulated, and are there just for compatibility. Please
use ethtool in preference, especially for SFPs.
Sure, just ethtool is not much of help for this particular matter, all there
is ethtool -m and according to you the EEPROM dump is not to be relied on.
How about just "ethtool eth2" ?
Settings for eth2:
        Supported ports: [ TP ]
        Supported link modes:   1000baseX/Full
        Supported pause frame use: Symmetric
        Supports auto-negotiation: Yes
        Supported FEC modes: Not reported
        Advertised link modes:  1000baseX/Full
        Advertised pause frame use: Symmetric
        Advertised auto-negotiation: Yes
        Advertised FEC modes: Not reported
        Speed: 1000Mb/s
        Duplex: Full
        Port: Twisted Pair
        PHYAD: 0
        Transceiver: internal
        Auto-negotiation: on
        MDI-X: Unknown
        Supports Wake-on: d
        Wake-on: d
        Link detected: yes
That looks fine.
quoted
quoted
quoted
CONFIG_DEBUG_GPIO is not the same as having debugfs support enabled.
If debugfs is enabled, then gpiolib will provide the current state
of gpios through debugfs.  debugfs is normally mounted on
/sys/kernel/debug, but may not be mounted by default depending on
policy.  Looking in /proc/filesystems will tell you definitively
whether debugfs is enabled or not in the kernel.
debugsfs is mounted but ls -af /sys/kernel/debug/gpio only producing
(oddly):

/sys/kernel/debug/gpio
Try "cat /sys/kernel/debug/gpio"
gpiochip2: GPIOs 504-511, parent: i2c/8-0071, pca9538, can sleep:
 gpio-504 (                    |tx-fault            ) in  lo IRQ
 gpio-505 (                    |tx-disable          ) out lo
 gpio-506 (                    |rate-select0        ) in  lo
 gpio-507 (                    |los                 ) in  lo IRQ
 gpio-508 (                    |mod-def0            ) in  lo IRQ
Which is also indicating everything is correct.  When the problem
occurs, check the state of the signals again as close as possible
to the event - it depends how long the transceiver keeps it
asserted.  You will probably find tx-fault is indicating
"in  hi IRQ".
Meantime Allnet responded, which basically sums up to (blame ping pong - it
is not me but go and look there instead...)

- driver support is not being handled by Allnet but by Metanoia, latter
being designer and manufacturer
- Allnet does not have the buying power to persuade Metanoia to look into
the matter
... which is pretty standard; no one will rework their SFP unless
they fear their sales will be severely impacted by the issue.
- it would appear that SFP.C is trying to communicate with Fiber-GBIC and
fails since the signal reports may not be 100% compatible
That's a fun claim, but note carefully the wording "may" which implies
some uncertainty in the statement.

Let's look at the wording of the GBIC (SFF-8053) and SFP (INF-8074 -
SFP MSA) documents.  The wording for the "fault recovery" is identical
between the two, which concerns what happens when TX_FAULT is asserted
and how to recover from that.

Concerning the implementation of TX_FAULT, SFF-8053 states:

  If no transmitter safety circuitry is implemented, the TX_FAULT signal
  may be tied to its negated state.

but then says later in the document:

  If TX_FAULT is not implemented, the signal shall be held to the low
  state by the GBIC.

Meanwhile, INF-8074 similarly states:

  If no transmitter safety circuitry is implemented, the TX_FAULT signal
  may be tied to its negated state.

but later on has a similar statement:

  TX_FAULT shall be implemented by those module definitions of SFP
  transceiver supporting safety circuitry. If TX_FAULT is not
  implemented, the signal shall be held to the low state by the SFP
  transceiver.

"shall" in both cases is stronger than "may".  So, there seems to be
little difference between the GBIC and SFP usage of this signal.

Their claim is that sfp.c implements the older GBIC style of signal
reports.  My counter-claim is that (a) sfp.c is written to the SFP MSA
and not the GBIC standard, and (b) there is no difference as far as the
TX_FAULT signal is concerned between the GBIC standard and the SFP MSA.

But... it doesn't matter that much, there's a module out there (and it
isn't the only one) which does "funny stuff" with its TX_FAULT signal.
Either we decide we want to support it and implement a quirk, or we
decide we don't want to support it.

There is an option bit in the EEPROM that is supposed to indicate
whether the module supports TX_FAULT, but, as you can guess, there are
problems with using that, as:

1) there are a lot of modules, particularly optical modules, that
   implement TX_FAULT correctly but don't set the option bit to say
   that they support the signal.

2) the other module I'm aware of that does "funny stuff" with its
   TX_FAULT signal does have the TX_FAULT option bit set.

So, the option bit is completely untrustworthy and, therefore, is
meaningless (which is why we don't use it.)

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help