Thread (14 messages) 14 messages, 5 authors, 2021-06-30

Re: [PATCH] net: called rtnl_unlock() before runpm resumes devices

From: AceLan Kao <acelan.kao@canonical.com>
Date: 2021-04-23 03:42:32
Also in: lkml

Heiner Kallweit [off-list ref] 於 2021年4月22日 週四 下午3:09寫道:
On 22.04.2021 08:30, AceLan Kao wrote:
quoted
Yes, should add

Fixes: 9474933caf21 ("igb: close/suspend race in netif_device_detach")
and also
Fixes: 9513d2a5dc7f ("igc: Add legacy power management support")
Please don't top-post. Apart from that:
If the issue was introduced with driver changes, then adding a workaround
in net core may not be the right approach.
It's hard to say who introduces this issue, we probably could point
our finger to below commit
bd869245a3dc net: core: try to runtime-resume detached device in __dev_open

This calling path is not usual, in my case, the NIC is not plugged in
any Ethernet cable,
and we are doing networking tests on another NIC on the system. So,
remove the rtnl lock from igb driver will affect other scenarios.
quoted
Jakub Kicinski [off-list ref] 於 2021年4月21日 週三 上午3:27寫道:
quoted
On Tue, 20 Apr 2021 10:34:17 +0200 Eric Dumazet wrote:
quoted
On Tue, Apr 20, 2021 at 9:54 AM AceLan Kao [off-list ref] wrote:
quoted
From: "Chia-Lin Kao (AceLan)" <acelan.kao@canonical.com>

The rtnl_lock() has been called in rtnetlink_rcv_msg(), and then in
__dev_open() it calls pm_runtime_resume() to resume devices, and in
some devices' resume function(igb_resum,igc_resume) they calls rtnl_lock()
again. That leads to a recursive lock.

It should leave the devices' resume function to decide if they need to
call rtnl_lock()/rtnl_unlock(), so call rtnl_unlock() before calling
pm_runtime_resume() and then call rtnl_lock() after it in __dev_open().
Hi Acelan

When was the bugg added ?
Please add a Fixes: tag
For immediate cause probably:

Fixes: 9474933caf21 ("igb: close/suspend race in netif_device_detach")
quoted
By doing so, you give more chances for reviewers to understand why the
fix is not risky,
and help stable teams work.
IMO the driver lacks internal locking. Taking 看rtnl from resume is just
one example, git history shows many more places that lacked locking and
got papered over with rtnl here.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help