Re: [PATCH] Use a spinlock to guard `fep->ptp_clk_on`

From: Csókás Bence <hidden>
Date: 2022-09-01 07:52:54

On 2022. 08. 31. 18:24, Andrew Lunn wrote:
 >>> Please keep to reverse christmas tree
 >>
 >> checkpatch didn't tell me that was a requirement... Easy to fix though.
 >
 > checkpatch does not have the smarts to detect this. And it is a netdev
 > only thing.

I thought checkpatch also has the per-subsystem rules, too.

 > There is also a different between not being able to sleep, and not
 > being able to process an interrupt for some other hardware. You got a
 > report about taking a mutex in atomic context. That just means you
 > cannot sleep, probably because a spinlock is held. That is very
 > different to not being able to handle interrupts. You only need
 > spin_lock_irqsave() if the interrupt handler also needs the same spin
 > lock to protect it from a thread using the spin lock.

Alright, I will switch to plain `spin_lock()` then.

On 2022. 08. 31. 19:12, Francesco Dolcini wrote:

On Wed, Aug 31, 2022 at 02:56:31PM +0200, Csókás Bence wrote:

quoted

Mutexes cannot be taken in a non-preemptible context,
causing a panic in `fec_ptp_save_state()`. Replacing
`ptp_clk_mutex` by `ptp_clk_lock` fixes this.

I would probably add the kernel BUG trace to the commit message.

quoted

Fixes: 91c0d987a9788dcc5fe26baafd73bf9242b68900
Fixes: 6a4d7234ae9a3bb31181f348ade9bbdb55aeb5c5
Fixes: f79959220fa5fbda939592bf91c7a9ea90419040

Just

Fixes: f79959220fa5 ("fec: Restart PPS after link state change >

quoted

Reported-by: Marc Kleine-Budde <mkl@pengutronix.de>

Is this https://lore.kernel.org/all/20220827160922.642zlcd5foopozru@pengutronix.de/ (local) the report?

Yes, precisely.

I just stumbled on the same issue on latest torvalds 6.0-rc3.

[   22.718465] =============================
[   22.725431] [ BUG: Invalid wait context ]
[   22.732439] 6.0.0-rc3-00007-gdcf8e5633e2e #1 Tainted: G        W
[   22.742278] -----------------------------
[   22.749217] kworker/3:1/45 is trying to lock:
[   22.757157] c211b71c (&fep->ptp_clk_mutex){+.+.}-{3:3}, at: fec_ptp_gettime+0x30/0xcc
[   22.770814] other info that might help us debug this:
[   22.778718] context-{4:4}
[   22.784065] 4 locks held by kworker/3:1/45:
[   22.790949]  #0: c20072a8 ((wq_completion)events_power_efficient){+.+.}-{0:0}, at: process_one_work+0x1e4/0x730
[   22.806494]  #1: e6a15f18 ((work_completion)(&(&dev->state_queue)->work)){+.+.}-{0:0}, at: process_one_work+0x1e4/0x730
[   22.822744]  #2: c287a478 (&dev->lock){+.+.}-{3:3}, at: phy_state_machine+0x34/0x228
[   22.835884]  #3: c211b2a4 (&dev->tx_global_lock){+...}-{2:2}, at: netif_tx_lock+0x10/0x1c

Thank you, this lock was the source of the problem!

Francesco

Bence

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help