Re: [PATCH iwl-net] e1000: Move cancel_work_sync to avoid deadlock
From: Stanislav Fomichev <hidden>
Date: 2025-05-30 15:07:31
Also in:
intel-wired-lan, lkml
From: Stanislav Fomichev <hidden>
Date: 2025-05-30 15:07:31
Also in:
intel-wired-lan, lkml
On 05/30, Joe Damato wrote:
Previously, e1000_down called cancel_work_sync for the e1000 reset task (via e1000_down_and_stop), which takes RTNL. As reported by users and syzbot, a deadlock is possible due to lock inversion in the following scenario: CPU 0: - RTNL is held - e1000_close - e1000_down - cancel_work_sync (takes the work queue mutex) - e1000_reset_task CPU 1: - process_one_work (takes the work queue mutex) - e1000_reset_task (takes RTNL)
nit: as Jakub mentioned in another thread, it seems more about the flush_work waiting for the reset_task to complete rather than wq mutexes (which are fake)? CPU 0: - RTNL is held - e1000_close - e1000_down - cancel_work_sync - __flush_work - <wait here for the reset_task to finish> CPU 1: - process_one_work - e1000_reset_task (takes RTNL) - <but cpu 0 already holds rtnl> The fix looks good! Acked-by: Stanislav Fomichev <sdf@fomichev.me>