Re: [PATCH] net: wwan: t7xx: fix race between TX thread and system PM suspend
From: Paolo Abeni <pabeni@redhat.com>
Date: 2026-05-28 09:22:02
Also in:
lkml
On 5/25/26 5:13 AM, Tim JH Chen wrote:
v2: Address two concerns raised in AI-assisted code review of v1:
1. [High] t7xx_dpmaif_resume() was unconditionally restoring state to
DPMAIF_STATE_PWRON regardless of the state before suspend. If the
modem had already been moved to DPMAIF_STATE_PWROFF by
t7xx_dpmaif_md_state_callback() (MD_STATE_EXCEPTION or
MD_STATE_STOPPED) prior to system suspend, resume would incorrectly
re-arm the TX kthread guard, allowing TX HW writes against a modem
the MD state machine considers stopped or in exception.
Fix: save dpmaif_ctrl->state into pre_suspend_state at the start of
t7xx_dpmaif_suspend() and restore that saved value in
t7xx_dpmaif_resume(), so a pre-suspend PWROFF is preserved across
the suspend/resume cycle.
2. [Medium] The v1 second state check before pm_runtime_resume_and_get()
only narrowed the TOCTOU window -- it did not close it. The state
field was a plain enum read and written without any lock or
READ_ONCE/WRITE_ONCE annotation. After the check passed on one CPU,
the suspend path on another CPU could still set state=PWROFF and
begin PM teardown before the kthread reached pm_runtime_resume_and_get(),
reproducing the deadlock.
Fix: introduce tx_pm_lock (struct mutex) held by the kthread across
the [state check -> pm_runtime_resume_and_get -> pm_runtime_put]
sequence. t7xx_dpmaif_suspend() acquires this lock before setting
DPMAIF_STATE_PWROFF, which serialises with any in-progress kthread
PM section and guarantees the kthread cannot enter
pm_runtime_resume_and_get() after the state flag is set.
READ_ONCE/WRITE_ONCE are added at every access point of the state
flag that crosses the suspend/resume boundary to prevent
compiler-visible tearing.
The original v1 description of the root cause and tested fix still
applies (deadlock between t7xx_dpmaif_tx_hw_push_thread calling
pm_runtime_resume_and_get() and the system PM suspend path, triggered
with ASPM L1 enabled after repeated suspend/resume cycles).
Tested: no soft lockup over 500+ suspend/resume cycles with SIM
registered and ASPM L1 enabled (previously triggered in < 300).
Fixes: 05f7e89ab ("Linux 6.19")
Signed-off-by: Tim JH Chen <redacted>Please have a much more better read of: Documentation/process/ especially: Documentation/process/maintainer-netdev.rst before your next submission, because this one is still lacking in many ways: - subj prefix must include the target tree (net) and a revision number (for the next iteration: v3) - fixes tag should point to the commit actually introducing the bug - the commit message should describe the issue and the fix, alike v1, any changelog-related information (~all the above) should land after the tag area and a '---' separator. Also sashiko has still quite a bit of concerns: https://netdev-ai.bots.linux.dev/sashiko/#/patchset/20260525031320.519435-1-tim.jh.chen%40wnc.com.tw and many of them look real. /P