Re: [PATCH] net: called rtnl_unlock() before runpm resumes devices

From: Heiner Kallweit <hkallweit1@gmail.com>
Date: 2021-04-27 06:18:45
Also in: lkml

On 27.04.2021 03:58, AceLan Kao wrote:

I got another issue ticket with the same dump_stack with igc NIC.
To fix this on the device side would lead to many other issues and we
have to check on the device side if the calling path contains
rtnl_lock already.

I think it's not clean that the igb/igc runtime_resume handlers call
__igb_open() before ndo_open(). I understand why it works in the
drivers, but I'd consider it a little bit hacky. And there's no
need for the drivers to acquire rtnl_lock before ndo_open().

I still think we should not keep rtnl locked when calling functions
outside of network stack.

Well, the runtime pm handlers of a MAC driver are within the network
stack. I understand that your proposed change is the easiest way to
work around the issue you're facing. I'm not sure it's the best way
however, but I don't have a strong opinion here and would leave it
to the maintainers which option they prefer.

Heiner Kallweit [off-list ref] 於 2021年4月26日 週一 下午4:42寫道：

quoted

On 26.04.2021 09:36, AceLan Kao wrote:

quoted

Heiner Kallweit [off-list ref] 於 2021年4月25日 週日 上午4:07寫道：

quoted

On 23.04.2021 05:42, AceLan Kao wrote:

quoted

Heiner Kallweit [off-list ref] 於 2021年4月22日 週四 下午3:09寫道：

quoted

On 22.04.2021 08:30, AceLan Kao wrote:

quoted

Yes, should add

Fixes: 9474933caf21 ("igb: close/suspend race in netif_device_detach")
and also
Fixes: 9513d2a5dc7f ("igc: Add legacy power management support")

Please don't top-post. Apart from that:
If the issue was introduced with driver changes, then adding a workaround
in net core may not be the right approach.

It's hard to say who introduces this issue, we probably could point
our finger to below commit
bd869245a3dc net: core: try to runtime-resume detached device in __dev_open

This calling path is not usual, in my case, the NIC is not plugged in
any Ethernet cable,
and we are doing networking tests on another NIC on the system. So,
remove the rtnl lock from igb driver will affect other scenarios.

quoted

Jakub Kicinski [off-list ref] 於 2021年4月21日 週三 上午3:27寫道：

quoted

On Tue, 20 Apr 2021 10:34:17 +0200 Eric Dumazet wrote:

quoted

On Tue, Apr 20, 2021 at 9:54 AM AceLan Kao [off-list ref] wrote:

quoted

From: "Chia-Lin Kao (AceLan)" <acelan.kao@canonical.com>

The rtnl_lock() has been called in rtnetlink_rcv_msg(), and then in
__dev_open() it calls pm_runtime_resume() to resume devices, and in
some devices' resume function(igb_resum,igc_resume) they calls rtnl_lock()
again. That leads to a recursive lock.

It should leave the devices' resume function to decide if they need to
call rtnl_lock()/rtnl_unlock(), so call rtnl_unlock() before calling
pm_runtime_resume() and then call rtnl_lock() after it in __dev_open().

Hi Acelan

When was the bugg added ?
Please add a Fixes: tag

For immediate cause probably:

Fixes: 9474933caf21 ("igb: close/suspend race in netif_device_detach")

quoted

By doing so, you give more chances for reviewers to understand why the
fix is not risky,
and help stable teams work.

IMO the driver lacks internal locking. Taking 看rtnl from resume is just
one example, git history shows many more places that lacked locking and
got papered over with rtnl here.

You could alternatively try the following. It should avoid the deadlock,
and when runtime-resuming if __IGB_DOWN is set all we do is marking the
net_device as present (because of PCI D3 -> D0 transition).
I do basically the same in r8169 and it works as intended.

Disclaimer: I don't have an igb-driven device and therefore can't test
the proposal.

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 038a9fd1a..21436626a 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c

@@ -9300,6 +9300,14 @@ static int __maybe_unused igb_runtime_suspend(struct device *dev)

 static int __maybe_unused igb_runtime_resume(struct device *dev)
 {
+       struct net_device *netdev = dev_get_drvdata(dev);
+       struct igb_adapter *adapter = netdev_priv(netdev);
+
+       if (test_bit(__IGB_DOWN, &adapter->state)) {
+               netif_device_attach(netdev);
+               return 0;
+       }
+
        return igb_resume(dev);
 }

--

2.31.1

Hi Heiner,

I encountered below error after applied your patch.

Presumably similar changes are needed also for the runtime_suspend callback.
If __IGB_DOWN is set, I think just the net_device needs to be detached.

quoted

[  121.489970] u kernel: ------------[ cut here ]------------
[  121.489979] u kernel: igb 0000:05:00.0: disabling already-disabled device
[  121.490008] u kernel: WARNING: CPU: 7 PID: 258 at
drivers/pci/pci.c:2146 pci_disable_device+0x91/0xb0
[  121.490028] u kernel: Modules linked in: rfcomm cmac algif_hash
algif_skcipher af_alg bnep btusb btrtl btbcm btintel bluetooth
ecdh_generic ecc joydev input_leds inte
l_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp
coretemp ath10k_pci ath10k_core kvm_intel ath mac80211 kvm
snd_sof_pci_intel_tgl snd_soc_acpi_intel_ma
tch snd_sof_intel_hda_common nls_iso8859_1 soundwire_intel
soundwire_generic_allocation soundwire_cadence soundwire_bus
snd_sof_pci snd_soc_acpi snd_sof snd_soc_core snd
_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi
crct10dif_pclmul crc32_pclmul snd_sof_xtensa_dsp ghash_clmulni_intel
ledtrig_audio aesni_intel snd_hda_intel
libarc4 crypto_simd snd_intel_dspcfg snd_intel_sdw_acpi cryptd
snd_hda_codec cfg80211 mei_hdcp snd_hwdep snd_hda_core
intel_wmi_thunderbolt snd_pcm wmi_bmof snd_seq inte
l_cstate efi_pstore snd_timer snd_seq_device ee1004 mei_me snd mei
ucsi_acpi soundcore typec_ucsi typec wmi mac_hid acpi_pad acpi_tad
sch_fq_codel
[  121.490314] u kernel:  parport_pc ppdev lp parport ip_tables
x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456
async_raid6_recov async_memcpy async_
pq async_xor async_tx libcrc32c xor raid6_pq raid1 raid0 multipath
linear hid_generic usbhid hid i915 drm_kms_helper syscopyarea
sysfillrect sysimgblt fb_sys_fops cec rc
_core igb drm nvme e1000e nvme_core i2c_i801 dca i2c_smbus
i2c_algo_bit intel_lpss_pci intel_lpss ahci idma64 video xhci_pci
libahci virt_dma xhci_pci_renesas pinctrl_ti
gerlake
[  121.490508] u kernel: CPU: 7 PID: 258 Comm: kworker/7:2 Tainted: G
   U            5.12.0-rc7+ #79
[  121.490518] u kernel: Hardware name: Dell Inc. OptiPlex 7090/, BIOS
0.12.80 02/23/2021
[  121.490525] u kernel: Workqueue: pm pm_runtime_work
[  121.490540] u kernel: RIP: 0010:pci_disable_device+0x91/0xb0
[  121.490550] u kernel: Code: 4d 85 e4 75 07 4c 8b a3 c8 00 00 00 48
8d bb c8 00 00 00 e8 61 8d 17 00 4c 89 e2 48 c7 c7 60 5a e0 a5 48 89
c6 e8 9b a3 59 00 <0f> 0b eb 8
d 48 89 df e8 e3 fe ff ff 80 a3 49 0a 00 00 df 5b 41 5c
[  121.490558] u kernel: RSP: 0018:ffffb76b4169fc90 EFLAGS: 00010286
[  121.490569] u kernel: RAX: 0000000000000000 RBX: ffff9e2581ee6000
RCX: 0000000000000027
[  121.490576] u kernel: RDX: 0000000000000027 RSI: ffffffffa493bca0
RDI: ffff9e27073e89b8
[  121.490582] u kernel: RBP: ffffb76b4169fca0 R08: ffff9e27073e89b0
R09: 0000000000000000
[  121.490588] u kernel: R10: 0000000000000000 R11: 0000000000000001
R12: ffff9e2581af7c80
[  121.490594] u kernel: R13: ffff9e2581ee6000 R14: ffff9e25a0914000
R15: ffff9e25a0915280
[  121.490600] u kernel: FS:  0000000000000000(0000)
GS:ffff9e2707200000(0000) knlGS:0000000000000000
[  121.490608] u kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  121.490614] u kernel: CR2: 00007ff86ec8d024 CR3: 0000000189c28002
CR4: 0000000000770ee0
[  121.490621] u kernel: PKRU: 55555554
[  121.490626] u kernel: Call Trace:
[  121.490638] u kernel:  __igb_shutdown+0xf2/0x1c0 [igb]
[  121.490676] u kernel:  igb_runtime_suspend+0x1c/0x20 [igb]
[  121.490703] u kernel:  pci_pm_runtime_suspend+0x63/0x180
[  121.490715] u kernel:  ? pci_pm_runtime_resume+0x90/0x90
[  121.490727] u kernel:  __rpm_callback+0xc7/0x140
[  121.490740] u kernel:  rpm_callback+0x57/0x80
[  121.490750] u kernel:  ? pci_pm_runtime_resume+0x90/0x90
[  121.490759] u kernel:  rpm_suspend+0x119/0x640
[  121.490774] u kernel:  pm_runtime_work+0x64/0xc0
[  121.490784] u kernel:  process_one_work+0x2af/0x5d0
[  121.490803] u kernel:  worker_thread+0x4d/0x3e0
[  121.490814] u kernel:  ? process_one_work+0x5d0/0x5d0
[  121.490825] u kernel:  kthread+0x12a/0x160
[  121.490834] u kernel:  ? kthread_park+0x90/0x90
[  121.490844] u kernel:  ret_from_fork+0x1f/0x30
[  121.490867] u kernel: irq event stamp: 0[  121.490871] u kernel:
hardirqs last  enabled at (0): [<0000000000000000>] 0x0
[  121.490916] u kernel: hardirqs last disabled at (0):
[<ffffffffa489ea44>] copy_process+0x714/0x1cc0
[  121.490929] u kernel: softirqs last  enabled at (0):
[<ffffffffa489ea44>] copy_process+0x714/0x1cc0
[  121.490938] u kernel: softirqs last disabled at (0): [<0000000000000000>] 0x0
[  121.490949] u kernel: ---[ end trace a9c7ffc27c226979 ]---

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help