Thread (6 messages) 6 messages, 4 authors, 2025-05-30

Re: [Bug] "possible deadlock in rtnl_newlink" in Linux kernel v6.13

From: Jacob Keller <jacob.e.keller@intel.com>
Date: 2025-05-22 23:05:18
Also in: lkml


On 5/21/2025 5:52 PM, John wrote:
Dear Linux Kernel Maintainers,

I hope this message finds you well.

I am writing to report a potential vulnerability I encountered during
testing of the Linux Kernel version v6.13.

Git Commit: ffd294d346d185b70e28b1a28abe367bbfe53c04 (tag: v6.13)

Bug Location: rtnl_newlink+0x86c/0x1dd0 net/core/rtnetlink.c:4011

Bug report: https://hastebin.com/share/ajavibofik.bash

Complete log: https://hastebin.com/share/derufumuxu.perl

Entire kernel config:  https://hastebin.com/share/lovayaqidu.ini

Root Cause Analysis:
The deadlock warning is caused by a circular locking dependency
between two subsystems:

Path A (CPU 0):
Holds rtnl_mutex in rtnl_newlink() →
Then calls e1000_close() →
Triggers e1000_down_and_stop() →
Calls __cancel_work_sync() →
Tries to flush adapter->reset_task (→ needs work_completion lock)

Path B (CPU 1):
Holds work_completion lock while running e1000_reset_task() →
Then calls e1000_down() →
Which tries to acquire rtnl_mutex
These two execution paths result in a circular dependency:
I guess this implies you can't cancel_work_sync while holding RTNL lock?
Hmm. Or maybe its because calling e1000_down from the e1000_reset_task
is a problem.
CPU 0: rtnl_mutex → work_completion
CPU 1: work_completion → rtnl_mutex

This violates lock ordering and can lead to a deadlock under contention.
This bug represents a classic case of lock inversion between
networking core (rtnl_mutex) and a device driver (e1000 workqueue
reset`).
It is a design-level concurrency flaw that can lead to deadlocks under
stress or fuzzing workloads.

At present, I have not yet obtained a minimal reproducer for this
issue. However, I am actively working on reproducing it, and I will
promptly share any additional findings or a working reproducer as soon
as it becomes available.
This is likely a regression in e400c7444d84 ("e1000: Hold RTNL when
e1000_down can be called")

@Joe, thoughts?

Thank you very much for your time and attention to this matter. I
truly appreciate the efforts of the Linux kernel community.

Best regards,
John
  
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help