Thread (35 messages) 35 messages, 7 authors, 2026-01-22

Re: [net-next,05/14] net: stmmac: add stmmac core serdes support

From: Vladimir Oltean <olteanv@gmail.com>
Date: 2026-01-21 16:23:50
Also in: linux-arm-kernel, linux-arm-msm, linux-phy

On Wed, Jan 21, 2026 at 02:46:42PM +0000, Russell King (Oracle) wrote:
On Tue, Jan 20, 2026 at 02:11:14PM +0200, Vladimir Oltean wrote:
quoted
On Tue, Jan 20, 2026 at 10:12:46AM +0000, Russell King (Oracle) wrote:
quoted
First, I'll say I'm on a very short fuse today; no dinner last night,
at the hospital up until 5:30am, and a fucking cold caller rang the door
bell at 10am this morning. Just fucking our luck.
Sorry to hear that.
quoted
On Tue, Jan 20, 2026 at 10:18:44AM +0200, Vladimir Oltean wrote:
quoted
Isn't it sufficient to set pl->pcs to NULL when pcs_enable() fails and
after calling pcs_disable(), though?
No. We've already called mac_prepare(), pcs_pre_config(),
pcs_post_config() by this time, we're past the point of being able to
unwind.
I'm set out to resolve a much smaller problem.

Calling it a full "unwind" is perhaps a bit much, because pcs_pre_config()
and pcs_post_config() don't have unwinding equivalents, unlike how
pcs_enable() has pcs_disable(). I don't see what API convention would be
violated if phylink decided to drop a PCS whose enable() returned an error.
While pcs_pre_config() and pcs_post_config() do not have unwinding
equivalents (what would they be?) the issue here is that these could
have changed any state that isn't simply undone by calling
pcs_disable().

For example, pcs_pre_config() could have reprogrammed signal routing,
clocking, or power supplies to blocks.

This already applies to Marvell DSA pcs-639x.c, where the pre/post
config hooks change the power state of the PCS block (for errata
handling), and the only way that gets undone is via a call to
pcs_disable() which explicitly disables IRQs and power for the PCS. Its
pcs_disable() isn't a strict reversal of pcs_enable(), it does more.

We already declare the interface to be dead on pcs_post_config()
failure, but we don't do that for pcs_enable() failure.

Maybe I need to explicitly state that pcs_disable() does not directly
balance pcs_enable(), but that _and_ the effects of pcs_pre_config()
and pcs_post_config(). However, that itself will add to the problems.
What if pcs_pre_config() and pcs_post_config() succeed but not
pcs_enable()? pcs-639x needs pcs_disable() to be called, but if we
require pcs_disable() to be balanced with a successful call to
pcs_enable(), that messes up that driver, and pretty much makes it
impossible to work around the errata.
What if we reordered phylink_major_config() such that phylink_pcs_enable()
comes first, followed by phylink_pcs_pre_config() -> phylink_mac_config() ->
phylink_pcs_post_config()? Superficially looking at pcs-639x, I don't
think it would break.

If we did that, we'd effectively have to also call pcs_disable() when
pcs_post_config() fails, and that is semantically compatible with saying
that pcs_disable() is balanced with pcs_enable(). It also gives the
ability for drivers such as pcs-639x to unwind in pcs_disable() any
actions done in pcs_enable(), pcs_pre_config() or pcs_post_config().

Plus, it's more natural/useful from an API perspective to say
"the PCS has to be enabled in order for anything to be done with it",
rather than the current "first mac_config cycle runs with the PCS not
enabled; subsequent mac_config cycles run with the PCS enabled".
If you feel strongly about this, then the only thing I can think of
doing is to move this SerDes support out of stmmac and into phylink
(which is a point I already raised in the cover message) so that
its failure can be dealt with at the higher level, where we can
ensure that phy_power_off() is balaced with phy_power_on(). However,
that means pushing even more of the stmmac specific "we need the
clocks running to access registers XYZ or reset" weirdness into
phylink.
I think core phylink support for generic PHYs eventually makes sense,
but at this stage it's perhaps too early, there's too much we don't yet
know. I would wait at least until it's clear, with an upstream example,
that multiple generic PHYs per phylink instance are needed: 1 SerDes PHY
per lane (for 40GBase-R etc), plus 1 retimer PHY per lane direction.
Also how do those retimers differ from SerDes PHYs. At the very least,
the phy_validate() of SerDes PHYs should be additive w.r.t.
supported_interfaces, whereas the phy_validate() of retimers should be
subtractive.

Also, moving SerDes PHY into phylink only avoids the problem, but if the
PCS driver needs to allocate memory, it will return. I have downstream
patches for a software backplane AN/LT state machine in phylink_pcs,
which is allocated in pcs_enable() and freed in pcs_disable().
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help