Thread (7 messages) 7 messages, 2 authors, 17d ago

Re: [PATCH] net: wwan: t7xx: fix race between TX thread and system PM suspend

From: Paolo Abeni <pabeni@redhat.com>
Date: 2026-05-28 09:22:02
Also in: lkml

On 5/25/26 5:13 AM, Tim JH Chen wrote:
v2: Address two concerns raised in AI-assisted code review of v1:

1. [High] t7xx_dpmaif_resume() was unconditionally restoring state to
   DPMAIF_STATE_PWRON regardless of the state before suspend.  If the
   modem had already been moved to DPMAIF_STATE_PWROFF by
   t7xx_dpmaif_md_state_callback() (MD_STATE_EXCEPTION or
   MD_STATE_STOPPED) prior to system suspend, resume would incorrectly
   re-arm the TX kthread guard, allowing TX HW writes against a modem
   the MD state machine considers stopped or in exception.

   Fix: save dpmaif_ctrl->state into pre_suspend_state at the start of
   t7xx_dpmaif_suspend() and restore that saved value in
   t7xx_dpmaif_resume(), so a pre-suspend PWROFF is preserved across
   the suspend/resume cycle.

2. [Medium] The v1 second state check before pm_runtime_resume_and_get()
   only narrowed the TOCTOU window -- it did not close it.  The state
   field was a plain enum read and written without any lock or
   READ_ONCE/WRITE_ONCE annotation.  After the check passed on one CPU,
   the suspend path on another CPU could still set state=PWROFF and
   begin PM teardown before the kthread reached pm_runtime_resume_and_get(),
   reproducing the deadlock.

   Fix: introduce tx_pm_lock (struct mutex) held by the kthread across
   the [state check -> pm_runtime_resume_and_get -> pm_runtime_put]
   sequence.  t7xx_dpmaif_suspend() acquires this lock before setting
   DPMAIF_STATE_PWROFF, which serialises with any in-progress kthread
   PM section and guarantees the kthread cannot enter
   pm_runtime_resume_and_get() after the state flag is set.
   READ_ONCE/WRITE_ONCE are added at every access point of the state
   flag that crosses the suspend/resume boundary to prevent
   compiler-visible tearing.

The original v1 description of the root cause and tested fix still
applies (deadlock between t7xx_dpmaif_tx_hw_push_thread calling
pm_runtime_resume_and_get() and the system PM suspend path, triggered
with ASPM L1 enabled after repeated suspend/resume cycles).

Tested: no soft lockup over 500+ suspend/resume cycles with SIM
registered and ASPM L1 enabled (previously triggered in < 300).

Fixes: 05f7e89ab ("Linux 6.19")
Signed-off-by: Tim JH Chen <redacted>
Please have a much more better read of:

Documentation/process/

especially:

Documentation/process/maintainer-netdev.rst

before your next submission, because this one is still lacking in many ways:

- subj prefix must include the target tree (net) and a revision number
 (for the next iteration: v3)
- fixes tag should point to the commit actually introducing the bug
- the commit message should describe the issue and the fix, alike v1,
any changelog-related information (~all the above) should land after the
tag area and a '---' separator.

Also sashiko has still quite a bit of concerns:

https://netdev-ai.bots.linux.dev/sashiko/#/patchset/20260525031320.519435-1-tim.jh.chen%40wnc.com.tw

and many of them look real.

/P
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help