Re: [net-next,05/14] net: stmmac: add stmmac core serdes support
From: Vladimir Oltean <olteanv@gmail.com>
Date: 2026-01-21 16:23:50
Also in:
linux-arm-kernel, linux-arm-msm, linux-phy
On Wed, Jan 21, 2026 at 02:46:42PM +0000, Russell King (Oracle) wrote:
On Tue, Jan 20, 2026 at 02:11:14PM +0200, Vladimir Oltean wrote:quoted
On Tue, Jan 20, 2026 at 10:12:46AM +0000, Russell King (Oracle) wrote:quoted
First, I'll say I'm on a very short fuse today; no dinner last night, at the hospital up until 5:30am, and a fucking cold caller rang the door bell at 10am this morning. Just fucking our luck.Sorry to hear that.quoted
On Tue, Jan 20, 2026 at 10:18:44AM +0200, Vladimir Oltean wrote:quoted
Isn't it sufficient to set pl->pcs to NULL when pcs_enable() fails and after calling pcs_disable(), though?No. We've already called mac_prepare(), pcs_pre_config(), pcs_post_config() by this time, we're past the point of being able to unwind.I'm set out to resolve a much smaller problem. Calling it a full "unwind" is perhaps a bit much, because pcs_pre_config() and pcs_post_config() don't have unwinding equivalents, unlike how pcs_enable() has pcs_disable(). I don't see what API convention would be violated if phylink decided to drop a PCS whose enable() returned an error.While pcs_pre_config() and pcs_post_config() do not have unwinding equivalents (what would they be?) the issue here is that these could have changed any state that isn't simply undone by calling pcs_disable(). For example, pcs_pre_config() could have reprogrammed signal routing, clocking, or power supplies to blocks. This already applies to Marvell DSA pcs-639x.c, where the pre/post config hooks change the power state of the PCS block (for errata handling), and the only way that gets undone is via a call to pcs_disable() which explicitly disables IRQs and power for the PCS. Its pcs_disable() isn't a strict reversal of pcs_enable(), it does more. We already declare the interface to be dead on pcs_post_config() failure, but we don't do that for pcs_enable() failure. Maybe I need to explicitly state that pcs_disable() does not directly balance pcs_enable(), but that _and_ the effects of pcs_pre_config() and pcs_post_config(). However, that itself will add to the problems. What if pcs_pre_config() and pcs_post_config() succeed but not pcs_enable()? pcs-639x needs pcs_disable() to be called, but if we require pcs_disable() to be balanced with a successful call to pcs_enable(), that messes up that driver, and pretty much makes it impossible to work around the errata.
What if we reordered phylink_major_config() such that phylink_pcs_enable() comes first, followed by phylink_pcs_pre_config() -> phylink_mac_config() -> phylink_pcs_post_config()? Superficially looking at pcs-639x, I don't think it would break. If we did that, we'd effectively have to also call pcs_disable() when pcs_post_config() fails, and that is semantically compatible with saying that pcs_disable() is balanced with pcs_enable(). It also gives the ability for drivers such as pcs-639x to unwind in pcs_disable() any actions done in pcs_enable(), pcs_pre_config() or pcs_post_config(). Plus, it's more natural/useful from an API perspective to say "the PCS has to be enabled in order for anything to be done with it", rather than the current "first mac_config cycle runs with the PCS not enabled; subsequent mac_config cycles run with the PCS enabled".
If you feel strongly about this, then the only thing I can think of doing is to move this SerDes support out of stmmac and into phylink (which is a point I already raised in the cover message) so that its failure can be dealt with at the higher level, where we can ensure that phy_power_off() is balaced with phy_power_on(). However, that means pushing even more of the stmmac specific "we need the clocks running to access registers XYZ or reset" weirdness into phylink.
I think core phylink support for generic PHYs eventually makes sense, but at this stage it's perhaps too early, there's too much we don't yet know. I would wait at least until it's clear, with an upstream example, that multiple generic PHYs per phylink instance are needed: 1 SerDes PHY per lane (for 40GBase-R etc), plus 1 retimer PHY per lane direction. Also how do those retimers differ from SerDes PHYs. At the very least, the phy_validate() of SerDes PHYs should be additive w.r.t. supported_interfaces, whereas the phy_validate() of retimers should be subtractive. Also, moving SerDes PHY into phylink only avoids the problem, but if the PCS driver needs to allocate memory, it will return. I have downstream patches for a software backplane AN/LT state machine in phylink_pcs, which is allocated in pcs_enable() and freed in pcs_disable().