Re: [Bug] "possible deadlock in rtnl_newlink" in Linux kernel v6.13
From: Joe Damato <hidden>
Date: 2025-05-29 23:50:21
Also in:
lkml
Subsystem:
intel ethernet drivers, networking drivers, the rest · Maintainers:
Tony Nguyen, Przemek Kitszel, Andrew Lunn, "David S. Miller", Eric Dumazet, Jakub Kicinski, Paolo Abeni, Linus Torvalds
On Thu, May 22, 2025 at 04:05:05PM -0700, Jacob Keller wrote:
On 5/21/2025 5:52 PM, John wrote:quoted
Dear Linux Kernel Maintainers, I hope this message finds you well. I am writing to report a potential vulnerability I encountered during testing of the Linux Kernel version v6.13. Git Commit: ffd294d346d185b70e28b1a28abe367bbfe53c04 (tag: v6.13) Bug Location: rtnl_newlink+0x86c/0x1dd0 net/core/rtnetlink.c:4011 Bug report: https://hastebin.com/share/ajavibofik.bash Complete log: https://hastebin.com/share/derufumuxu.perl Entire kernel config: https://hastebin.com/share/lovayaqidu.ini Root Cause Analysis: The deadlock warning is caused by a circular locking dependency between two subsystems: Path A (CPU 0): Holds rtnl_mutex in rtnl_newlink() -> Then calls e1000_close() -> Triggers e1000_down_and_stop() -> Calls __cancel_work_sync() -> Tries to flush adapter->reset_task (-> needs work_completion lock) Path B (CPU 1): Holds work_completion lock while running e1000_reset_task() -> Then calls e1000_down() -> Which tries to acquire rtnl_mutex These two execution paths result in a circular dependency:I guess this implies you can't cancel_work_sync while holding RTNL lock? Hmm. Or maybe its because calling e1000_down from the e1000_reset_task is a problem.quoted
CPU 0: rtnl_mutex -> work_completion CPU 1: work_completion -> rtnl_mutex This violates lock ordering and can lead to a deadlock under contention. This bug represents a classic case of lock inversion between networking core (rtnl_mutex) and a device driver (e1000 workqueue reset`). It is a design-level concurrency flaw that can lead to deadlocks under stress or fuzzing workloads. At present, I have not yet obtained a minimal reproducer for this issue. However, I am actively working on reproducing it, and I will promptly share any additional findings or a working reproducer as soon as it becomes available.This is likely a regression in e400c7444d84 ("e1000: Hold RTNL when e1000_down can be called") @Joe, thoughts?
Sorry for the delay, was out of the office for a bit. I agree with
the report that the locking order is problematic and with your
report that it was introduced by the above commit.
I wonder if e1000_down needs to cancel the reset_task at all?
If you look a layer below the original bug report, you'll note that
e1000_down calls e1000_reinit_locked which itself has the following
code:
/* only run the task if not already down */
if (!test_bit(__E1000_DOWN, &adapter->flags)) {
e1000_down(adapter);
e1000_up(adapter);
}
So, it seems like the flow in the e1000_down case would be something like this
(please correct me if I've gotten it wrong):
e1000_down -> e1000_down_and_stop (which sets the __E1000_DOWN bit) ->
cancel_work_sync -> e1000_reset_task -> grabs RTNL, calls e1000_reinit_locked
e1000_reinit_locked -> checks the bit via the code above and does nothing
I could be totally off here, but it seems like in the e1000_down case, calling
e1000_reinit_locked is unnecessary since the __E1000_DOWN bit prevents anything
from happening.
Maybe a potential solution might be to move the cancel_work_sync out of
e1000_down_and_stop and move it into e1000_remove directly?
Something vaguely like (untested):
diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c b/drivers/net/ethernet/intel/e1000/e1000_main.c
index 3f089c3d47b2..62a77b34c9ff 100644
--- a/drivers/net/ethernet/intel/e1000/e1000_main.c
+++ b/drivers/net/ethernet/intel/e1000/e1000_main.c@@ -477,10 +477,6 @@ static void e1000_down_and_stop(struct e1000_adapter *adapter) cancel_delayed_work_sync(&adapter->phy_info_task); cancel_delayed_work_sync(&adapter->fifo_stall_task); - - /* Only kill reset task if adapter is not resetting */ - if (!test_bit(__E1000_RESETTING, &adapter->flags)) - cancel_work_sync(&adapter->reset_task); } void e1000_down(struct e1000_adapter *adapter)
@@ -1262,6 +1258,11 @@ static void e1000_remove(struct pci_dev *pdev) bool disable_dev; e1000_down_and_stop(adapter); + + /* Only kill reset task if adapter is not resetting */ + if (!test_bit(__E1000_RESETTING, &adapter->flags)) + cancel_work_sync(&adapter->reset_task); + e1000_release_manageability(adapter); unregister_netdev(netdev);