Re: [PATCH 2/2] net: dsa: microchip: Provide Module 4 KSZ9477 errata (DS80000754C)
From: Lukasz Majewski <lukma@denx.de>
Date: 2023-08-29 12:39:43
Also in:
lkml
Hi Oleksij,
Hi Lukasz, On Tue, Aug 29, 2023 at 01:24:29PM +0200, Lukasz Majewski wrote:quoted
Hi Vladimir,quoted
Hi Lukasz, On Tue, Aug 29, 2023 at 10:35:33AM +0200, Lukasz Majewski wrote:quoted
Hi Vladimir,quoted
On Fri, Aug 25, 2023 at 06:48:41PM +0000, Tristram.Ha@microchip.com wrote:quoted
quoted
quoted
IMHO adding functions to MMD modification would facilitate further development (for example LED setup).We already have some KSZ9477 specific initialization done in the Micrel PHY driver under drivers/net/phy/micrel.c, can we converge on the PHY driver which has a reasonable amount of infrastructure for dealing with workarounds, indirect or direct MMD accesses etc.?Actually the internal PHY used in the KSZ9897/KSZ9477/KSZ9893 switches are special and only used inside those switches. Putting all the switch related code in Micrel PHY driver does not really help. When the switch is reset all those PHY registers need to be set again, but the PHY driver only executes those code during PHY initialization. I do not know if there is a good way to tell the PHY to re-initialize again.Suppose there was a method to tell the PHY driver to re-initialize itself. What would be the key points in which the DSA switch driver would need to trigger that method? Where is the switch reset at runtime?Tristam has explained why adding the internal switch PHY errata to generic PHY code is not optimal.Yes, and I didn't understand that explanation, so I asked a clarification question.Ok. Let's wait for Tristram's answer.quoted
quoted
If adding MMD generic code is a problem - then I'm fine with just clearing proper bits with just two indirect writes in the drivers/net/dsa/microchip/ksz9477.c I would also prefer to keep the separate ksz9477_errata() function, so we could add other errata code there. Just informative - without this patch the KSZ9477-EVB board's network is useless when the other peer has EEE enabled by default (like almost all non managed ETH switches).No, adding direct PHY MMD access code to the ksz9477 switch driver is not even the biggest problem - even though, IIUC, the "workaround" to disable EEE advertisement could be moved to ksz9477_get_features() in drivers/net/phy/micrel.c, where phydev->supported_eee could be cleared.To be even more interesting (after looking into the PHY micrel.c code): https://elixir.bootlin.com/linux/latest/source/drivers/net/phy/micrel.c#L1804 The errata from this patch is already present. The issue is that ksz9477_config_init() (drivers/net/phy/micrel.c) is executed AFTER generic phy_probe(): https://elixir.bootlin.com/linux/latest/source/drivers/net/phy/phy_device.c#L3256 in which the EEE advertisement registers are read. Hence, those registers needs to be cleared earlier - as I do in ksz9477_setup() in drivers/net/dsa/microchip/ksz9477. Here the precedence matters ...quoted
The biggest problem that I see is that Oleksij Rempel has "just" added EEE support to the KSZ9477 earlier this year, with an ack from Arun Ramadoss: 69d3b36ca045 ("net: dsa: microchip: enable EEE support"). I'm not understanding why the erratum wasn't a discussion topic then.+1As this erratum states: "this feature _can_ cause link drops". For example I was indeed able to have EEE relates issue between this switch and a link partner with AR8035 PHY. Following patch addressing this issue: https://lore.kernel.org/all/20230327142202.3754446-8-o.rempel@pengutronix.de/ (local) So, in this case KSZ9477 was not the bad side.
The errata: http://ww1.microchip.com/downloads/jp/DeviceDoc/jp599888.pdf Module 4, "End user implications": --------8<---------- If the link partner is not known, or if the link partner is EEE capable, then the EEE feature should be manually disabled to avoid link drop problems. -------->8----------
Since this erratum do not describe exact cause of this issue
IMHO, it does - "The EEE feature is enabled by default, but it is not fully operational. " It looks like some silicon issue - which in details is probably only known to Micrel/Microchip.
or specific link partners where this functionality is not working, I would prefer to give the user the freedom of choice.
The problem is that - the user - would encounter broken network when connected to per advertising EEE. Hence, I would prefer to apply the Errata and then somebody, who would like to enable EEE can try if it works for him. IMHO, code to fix erratas shall be added unconditionally, without any "freedom of choice".
The same issue we have with Pause Frame support. It is not always a good choice, but user has freedom to configure it. Today I wont to create a test setup with different EEE capable link partners on one side and KSZ9477 on other side and let it run some days. Just to make sure. Beside, are you able to reproduce this issue?
Yes, I can reproduce the issue. I do use two Microchip's development boards (KSZ9477-EVB [1]) connected together to test HSR as well as communication with HOST PC. The network on this board without this patch is not usable (continually I do encounter link up/downs). Another test scenario is to connect this board to non-managed ETH switch (which shall have the EEE advertised by default). Please be also aware, that this errata fix is (implicitly I think) already present in the kernel: https://elixir.bootlin.com/linux/latest/source/drivers/net/phy/micrel.c#L1804 However, the execution order of PHY/DSA functions with newest mainline makes it not working any more (I've described it in details in the earlier mail to Vladimir).
Regards, Oleksij
Links: [1] - https://www.microchip.com/en-us/development-tool/evb-ksz9477-1 Best regards, Lukasz Majewski -- DENX Software Engineering GmbH, Managing Director: Erika Unter HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma@denx.de
Attachments
- (unnamed) [application/pgp-signature] 488 bytes