Re: RTL8402 stops working after hibernate/resume
From: Petr Tesarik <hidden>
Date: 2020-09-29 21:17:19
Also in:
linux-clk
Hi, On Tue, 29 Sep 2020 22:08:56 +0200 Hans de Goede [off-list ref] wrote:
Hi, On 9/29/20 9:07 PM, Petr Tesarik wrote:quoted
Hi Heiner (and now also Hans)! @Hans: I'm adding you to this conversation, because you're the author of commit b1e3454d39f99, which seems to break the r8169 driver on a laptop of mine.Erm, no, as you bi-sected yourself already commit 9f0b54cd167219 ("r8169: move switching optional clock on/off to pll power functions") Broke your laptop, commit b1e3454d39f99 ("clk: x86: add "ether_clk" alias for Bay Trail / Cherry Trail") is about 18 months older.
Well, I'm no expert on power management, nor clock drivers or network drivers. I'm an end user, so I had to accept Hans' claim that the pointer should be NULL...
[...]quoted
That's because this kernel is built with CONFIG_PMC_ATOM=y, and looking at the platform initialization code, the "ether_clk" alias is created unconditionally. Hans already added.Petr, unconditionally is not really correct here, just as claiming above that my commit broke things was not really correct either.
I'm sorry if you feel offended. By "unconditionally" I mean that this clock cannot be disabled by the user (e.g. with a kernel command line option).
[...] So with that all clarified lets try to see if we can figure out *why* this is actually happening. Petr, can you describe your hardware in some more detail, in the bits quoted when you first Cc-ed me there is not that much detail. What is the vendor and model of your laptop?
Seeing that Heiner has already agreed to revert his patch, I'm not sure this is still relevant, but here I go. This is an ASUS X453MA notebook with a dual Intel(R) Celeron(R) CPU N2840 @ 2.16GHz, so it's totally expected that my system uses the Intel Atom Processor E3800 Series drivers.
Looking closer at commit 9f0b54cd167219
("r8169: move switching optional clock on/off to pll power functions")
I notice that the functions which now enable/disable the clock:
rtl_pll_power_up() and rtl_pll_power_down()
Only run when the interface is up during suspend/resume.
Petr, I guess the laptop is not connected to ethernet when you
hibernate it?The laptop is always connected to an active 1000G Ethernet switch port. However, there seem to be two bugs related to the Realtek driver: one causes a hard hang on driver load (that's the one we're handling here), the other is related to incorrect Rx queue initialization after resume. They may or may not be related.
That means that on resume the clock will not be re-enabled. This is a subtle but important change and I believe that this is what is breaking things. I guess that the PLL which rtl_pll_power_up() / rtl_pll_power_down() controls is only used for ethernet-timing. But the external clock controlled through pt->clk is a replacement for using an external crystal with the r8169 chip. So with it disabled, the entire chip does not have a clock and is essentially dead. It can then e.g. not respond to any pci-e reads/writes done to it.
...which would nicely explain the hang, indeed. Thanks, Petr T
So I believe that the proper fix for this is to revert
commit 9f0b54cd167219
("r8169: move switching optional clock on/off to pll power functions")
As that caused the whole chip's clock to be left off after
a suspend/resume while the interface is down.
Also some remarks about this while I'm being a bit grumpy about
all this anyways (sorry):
1. 9f0b54cd167219 ("r8169: move switching optional clock on/off
to pll power functions") commit's message does not seem to really
explain why this change was made...
2. If a git blame would have been done to find the commit adding
the clk support: commit c2f6f3ee7f22 ("r8169: Get and enable optional
ether_clk clock") then you could have known that the clk in question
is an external clock for the entire chip, the commit message pretty
clearly states this (although "the entire" part is implied only) :
"On some boards a platform clock is used as clock for the r8169 chip,
this commit adds support for getting and enabling this clock (assuming
it has an "ether_clk" alias set on it).
This is related to commit d31fd43c0f9a ("clk: x86: Do not gate clocks
enabled by the firmware") which is a previous attempt to fix this for
some x86 boards, but this causes all Cherry Trail SoC using boards to
not reach there lowest power states when suspending.
This commit (together with an atom-pmc-clk driver commit adding the
alias) fixes things properly by making the r8169 get the clock and
enable it when it needs it."
Regards,
Hans Attachments
- (unnamed) [application/pgp-signature] 488 bytes