Thread (40 messages) 40 messages, 3 authors, 2020-01-11

Re: [drivers/net/phy/sfp] intermittent failure in state machine checks

From: ѽ҉ᶬḳ℠ <hidden>
Date: 2020-01-09 19:42:33

On 09/01/2020 19:01, ѽ҉ᶬḳ℠ wrote:
On 09/01/2020 17:43, Russell King - ARM Linux admin wrote:
quoted
On Thu, Jan 09, 2020 at 05:35:23PM +0000, ѽ҉ᶬḳ℠ wrote:
quoted
Thank you for the extensive feedback and explanation.

Pardon for having mixed up the semantics on module specifications 
vs. EEPROM
dump...

The module (chipset) been designed by Metanoia, not sure who is the 
actual
manufacturer, and probably just been branded Allnet.
The designer provides some proprietary management software (called 
EBM) to
their wholesale buyers only
I have one of their early MT-V5311 modules, but it has no accessible
EEPROM, and even if it did, it would be of no use to me being
unapproved for connection to the BT Openreach network.  (BT SIN 498
specifies non-standard power profile to avoid crosstalk issues with
existing ADSL infrastructure, and I believe they regularly check the
connected modem type and firmware versions against an approved list.)

I haven't noticed the module I have asserting its TX_FAULT signal,
but then its RJ45 has never been connected to anything.
The curious (and sort of inexplicable) thing is that the module in 
general works, i.e. at some point it must pass the sm checks or 
connectivity would be failing constantly and thus the module being 
generally unusable.

The reported issues however are intermittent, usually reliably 
reproducible with

ifdown <iface> && ifup <iface>

or rebooting the router that hosts the module.

If some times passes, not sure but seems in excess of 3 minutes, 
between ifdown and ifup the sm checks mostly are not failing.
It somehow "feels" that the module is storing some link signal 
information in a register which does not suit the sm check routine and 
only when that register clears the sm check routine passes and 
connectivity is restored.
____

Since there are probably other such SFP modules, xDSL and g.fast, out 
there that do not provide laser safety circuitry by design (since not 
providing connectivity over fibre) would it perhaps not make sense to 
try checking for the existence of laser safety circuitry first prior 
getting to the sm checks?
____
I am wondering whether this mentioned in 
https://gitlab.labs.nic.cz/turris/turris-build/issues/89 is the cause of 
the issue perhaps:

Even when/after the SFP module is recognized and the link mode it set 
for the NIC to the proper value there can still be the link-up signal 
mismatch that we have seen on many non-ethernet SFPs. The thing is that 
one of the SFP pins is called LOS (loss of signal) and when the pin is 
in active state it is being interpreted by the Linux kernel as "link is 
down", turn off the NIC. Unfortunatelly we have seen chicken-and-egg 
problem with some GPON and DSL SFPs - the SFP does not come up and 
deassert LOS unless there is SGMII link from NIC and NIC is not coming 
up unless LOS is deasserted.



Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help