Thread (11 messages) 11 messages, 4 authors, 2026-02-03

Re: [PATCH net V2 2/4] net/mlx5: Fix deadlock between devlink lock and esw->wq

From: Cosmin Ratiu <hidden>
Date: 2026-02-02 14:48:32
Also in: linux-rdma, lkml

On Thu, 2026-01-29 at 15:40 -0800, Jakub Kicinski wrote:
On Thu, 29 Jan 2026 10:33:40 +0000 Cosmin Ratiu wrote:
quoted
quoted
This is quite an ugly hack, is there no way to avoid the flush
and
let 
the work discover that what it was supposed to do is no longer
needed?  
Not possible, unfortunately. I stared at it for quite a while. The
wq
is flushed because the esw is being unconfigured, which removes
data
structs the work handler uses. Flushing the work is required,
otherwise
we'll run into worse issues.
And having a refount on (I presume) struct mlx5_esw_functions
so that work can hold a ref is not an option?
Are you planning to revisit this in -next?
Currently, mlx5_eswitch_disable_locked (with the devlink lock held)
waits for esw_vfs_changed_event_handler to finish.
The event handler needs to acquire the same lock and load/unload all
VFs, which touches the entire esw.
I don't currently see how to use reference counting on the esw to avoid
waiting for the handler.

But we can have a deeper look as part of an internal task to improve
this. For now, please accept the V3 fix (about-to-be-sent) with the
current approach because we couldn't find a better way.

Cosmin.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help