Thread (40 messages) 40 messages, 3 authors, 2020-01-11

Re: [drivers/net/phy/sfp] intermittent failure in state machine checks

From: Russell King - ARM Linux admin <linux@armlinux.org.uk>
Date: 2020-01-09 21:38:55

On Thu, Jan 09, 2020 at 07:42:27PM +0000, ѽ҉ᶬḳ℠ wrote:
On 09/01/2020 19:01, ѽ҉ᶬḳ℠ wrote:
quoted
On 09/01/2020 17:43, Russell King - ARM Linux admin wrote:
quoted
On Thu, Jan 09, 2020 at 05:35:23PM +0000, ѽ҉ᶬḳ℠ wrote:
quoted
Thank you for the extensive feedback and explanation.

Pardon for having mixed up the semantics on module
specifications vs. EEPROM
dump...

The module (chipset) been designed by Metanoia, not sure who is
the actual
manufacturer, and probably just been branded Allnet.
The designer provides some proprietary management software
(called EBM) to
their wholesale buyers only
I have one of their early MT-V5311 modules, but it has no accessible
EEPROM, and even if it did, it would be of no use to me being
unapproved for connection to the BT Openreach network.  (BT SIN 498
specifies non-standard power profile to avoid crosstalk issues with
existing ADSL infrastructure, and I believe they regularly check the
connected modem type and firmware versions against an approved list.)

I haven't noticed the module I have asserting its TX_FAULT signal,
but then its RJ45 has never been connected to anything.
The curious (and sort of inexplicable) thing is that the module in
general works, i.e. at some point it must pass the sm checks or
connectivity would be failing constantly and thus the module being
generally unusable.

The reported issues however are intermittent, usually reliably
reproducible with

ifdown <iface> && ifup <iface>

or rebooting the router that hosts the module.

If some times passes, not sure but seems in excess of 3 minutes, between
ifdown and ifup the sm checks mostly are not failing.
It somehow "feels" that the module is storing some link signal
information in a register which does not suit the sm check routine and
only when that register clears the sm check routine passes and
connectivity is restored.
____

Since there are probably other such SFP modules, xDSL and g.fast, out
there that do not provide laser safety circuitry by design (since not
providing connectivity over fibre) would it perhaps not make sense to
try checking for the existence of laser safety circuitry first prior
getting to the sm checks?
____
I am wondering whether this mentioned in
https://gitlab.labs.nic.cz/turris/turris-build/issues/89 is the cause of the
issue perhaps:

Even when/after the SFP module is recognized and the link mode it set for
the NIC to the proper value there can still be the link-up signal mismatch
that we have seen on many non-ethernet SFPs. The thing is that one of the
SFP pins is called LOS (loss of signal) and when the pin is in active state
it is being interpreted by the Linux kernel as "link is down", turn off the
NIC. Unfortunatelly we have seen chicken-and-egg problem with some GPON and
DSL SFPs - the SFP does not come up and deassert LOS unless there is SGMII
link from NIC and NIC is not coming up unless LOS is deasserted.
That would be very very broken behaviour, but one which the kernel
doesn't care about.

If RX_LOS is active, we do *not* disable the NIC. We just use RX_LOS as
an additional input to evaluating whether the link is up.  The NIC will
still be configured for the appropriate mode irrespective of the state
of RX_LOS.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help