Re: [syzbot] [net?] possible deadlock in rtnl_newlink
From: Joe Damato <hidden>
Date: 2025-05-29 23:54:16
Also in:
lkml
On Thu, May 29, 2025 at 09:45:10AM -0700, Stanislav Fomichev wrote:
On 05/29, Jakub Kicinski wrote:quoted
On Thu, 29 May 2025 08:59:43 -0700 Stanislav Fomichev wrote:quoted
So this is internal WQ entry lock that is being reordered with rtnl lock. But looking at process_one_work, I don't see actual locks, mostly lock_map_acquire/lock_map_release calls to enforce some internal WQ invariants. Not sure what to do with it, will try to read more.Basically a flush_work() happens while holding rtnl_lock, but the work itself takes that lock. It's a driver bug.e400c7444d84 ("e1000: Hold RTNL when e1000_down can be called") ? I think similar things (but wrt netdev instance lock) are happening with iavf: iavf_remove calls cancel_work_sync while holding the instance lock and the work callbacks grab the instance lock as well :-/
I think this is probably the same thread as: https://lore.kernel.org/netdev/CAP=Rh=OEsn4y_2LvkO3UtDWurKcGPnZ_NPSXK=FbgygNXL37Sw@mail.gmail.com/ (local) I posted a response there about how to possibly avoid the problem (based on my rough reading of the driver code), but am still thinking more on this.