Re: [PATCH v2] cfg80211: avoid holding the RTNL when calling the driver
From: Marek Szyprowski <m.szyprowski@samsung.com>
Date: 2021-01-22 12:19:03
Also in:
linux-wireless
Hi Johannes, On 19.01.2021 10:21, Johannes Berg wrote:
From: Johannes Berg <redacted> Currently, _everything_ in cfg80211 holds the RTNL, and if you have a slow USB device (or a few) you can get some bad lock contention on that. Fix that by re-adding a mutex to each wiphy/rdev as we had at some point, so we have locking for the wireless_dev lists and all the other things in there, and also so that drivers still don't have to worry too much about it (they still won't get parallel calls for a single device). Then, we can restrict the RTNL to a few cases where we add or remove interfaces and really need the added protection. Some of the global list management still also uses the RTNL, since we need to have it anyway for netdev management, but we only hold the RTNL for very short periods of time here. Signed-off-by: Johannes Berg <redacted>
This patch landed in today's (20210122) linux-next as commit
791daf8fc49a ("cfg80211: avoid holding the RTNL when calling the
driver"). Sadly, it causes deadlock with mwifiex driver. I think that
lockdep report describes it enough:
Bluetooth: vendor=0x2df, device=0x912e, class=255, fn=2
cfg80211: Loading compiled-in X.509 certificates for regulatory database
cfg80211: Loaded X.509 cert 'sforshee: 00b28ddf47aef9cea7'
Bluetooth: FW download over, size 800344 bytes
btmrvl_sdio mmc2:0001:2: sdio device tree data not available
mwifiex_sdio mmc2:0001:1: WLAN is not the winner! Skip FW dnld
mwifiex_sdio mmc2:0001:1: WLAN FW is active
mwifiex_sdio mmc2:0001:1: CMD_RESP: cmd 0x242 error, result=0x2
mwifiex_sdio mmc2:0001:1: mwifiex_process_cmdresp: cmd 0x242 failed
during initialization
============================================
WARNING: possible recursive locking detected
5.11.0-rc4-00535-g791daf8fc49a #2336 Not tainted
--------------------------------------------
kworker/2:3/108 is trying to acquire lock:
c4f62b38 (&rdev->wiphy.mtx){+.+.}-{3:3}, at: _mwifiex_fw_dpc+0x2c0/0x49c
[mwifiex]
but task is already holding lock:
c4f62b38 (&rdev->wiphy.mtx){+.+.}-{3:3}, at: _mwifiex_fw_dpc+0x248/0x49c
[mwifiex]
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&rdev->wiphy.mtx);
lock(&rdev->wiphy.mtx);
*** DEADLOCK ***
May be due to missing lock nesting notation
4 locks held by kworker/2:3/108:
#0: c1c066a8 ((wq_completion)events){+.+.}-{0:0}, at:
process_one_work+0x24c/0x888
#1: deccbf10 ((work_completion)(&fw_work->work)){+.+.}-{0:0}, at:
process_one_work+0x24c/0x888
#2: c13202dc (rtnl_mutex){+.+.}-{3:3}, at: _mwifiex_fw_dpc+0x23c/0x49c
[mwifiex]
#3: c4f62b38 (&rdev->wiphy.mtx){+.+.}-{3:3}, at:
_mwifiex_fw_dpc+0x248/0x49c [mwifiex]
stack backtrace:
CPU: 2 PID: 108 Comm: kworker/2:3 Not tainted
5.11.0-rc4-00535-g791daf8fc49a #2336
Hardware name: Samsung Exynos (Flattened Device Tree)
Workqueue: events request_firmware_work_func
[<c01116e8>] (unwind_backtrace) from [<c010cf58>] (show_stack+0x10/0x14)
[<c010cf58>] (show_stack) from [<c0b3ad3c>] (dump_stack+0xa4/0xc4)
[<c0b3ad3c>] (dump_stack) from [<c0195fd8>] (__lock_acquire+0xc20/0x31cc)
[<c0195fd8>] (__lock_acquire) from [<c019923c>] (lock_acquire+0x2e4/0x5dc)
[<c019923c>] (lock_acquire) from [<c0b4217c>] (__mutex_lock+0xa4/0xb60)
[<c0b4217c>] (__mutex_lock) from [<c0b42c54>] (mutex_lock_nested+0x1c/0x24)
[<c0b42c54>] (mutex_lock_nested) from [<bf1c87f8>]
(_mwifiex_fw_dpc+0x2c0/0x49c [mwifiex])
[<bf1c87f8>] (_mwifiex_fw_dpc [mwifiex]) from [<c06bfd18>]
(request_firmware_work_func+0x58/0x94)
[<c06bfd18>] (request_firmware_work_func) from [<c0149d48>]
(process_one_work+0x30c/0x888)
[<c0149d48>] (process_one_work) from [<c014a31c>] (worker_thread+0x58/0x594)
[<c014a31c>] (worker_thread) from [<c0151284>] (kthread+0x154/0x19c)
[<c0151284>] (kthread) from [<c010011c>] (ret_from_fork+0x14/0x38)
Exception stack(0xdeccbfb0 to 0xdeccbff8)
...
> ...
Best regards
--
Marek Szyprowski, PhD
Samsung R&D Institute Poland