Thread (8 messages) 8 messages, 2 authors, 2021-01-26

Re: [PATCH v2] cfg80211: avoid holding the RTNL when calling the driver

From: Marek Szyprowski <m.szyprowski@samsung.com>
Date: 2021-01-22 12:19:03
Also in: linux-wireless

Hi Johannes,

On 19.01.2021 10:21, Johannes Berg wrote:
From: Johannes Berg <redacted>

Currently, _everything_ in cfg80211 holds the RTNL, and if you
have a slow USB device (or a few) you can get some bad lock
contention on that.

Fix that by re-adding a mutex to each wiphy/rdev as we had at
some point, so we have locking for the wireless_dev lists and
all the other things in there, and also so that drivers still
don't have to worry too much about it (they still won't get
parallel calls for a single device).

Then, we can restrict the RTNL to a few cases where we add or
remove interfaces and really need the added protection. Some
of the global list management still also uses the RTNL, since
we need to have it anyway for netdev management, but we only
hold the RTNL for very short periods of time here.

Signed-off-by: Johannes Berg <redacted>
This patch landed in today's (20210122) linux-next as commit 
791daf8fc49a ("cfg80211: avoid holding the RTNL when calling the 
driver"). Sadly, it causes deadlock with mwifiex driver. I think that 
lockdep report describes it enough:

Bluetooth: vendor=0x2df, device=0x912e, class=255, fn=2
cfg80211: Loading compiled-in X.509 certificates for regulatory database
cfg80211: Loaded X.509 cert 'sforshee: 00b28ddf47aef9cea7'
Bluetooth: FW download over, size 800344 bytes
btmrvl_sdio mmc2:0001:2: sdio device tree data not available
mwifiex_sdio mmc2:0001:1: WLAN is not the winner! Skip FW dnld
mwifiex_sdio mmc2:0001:1: WLAN FW is active
mwifiex_sdio mmc2:0001:1: CMD_RESP: cmd 0x242 error, result=0x2
mwifiex_sdio mmc2:0001:1: mwifiex_process_cmdresp: cmd 0x242 failed 
during       initialization

============================================
WARNING: possible recursive locking detected
5.11.0-rc4-00535-g791daf8fc49a #2336 Not tainted
--------------------------------------------
kworker/2:3/108 is trying to acquire lock:
c4f62b38 (&rdev->wiphy.mtx){+.+.}-{3:3}, at: _mwifiex_fw_dpc+0x2c0/0x49c 
[mwifiex]

but task is already holding lock:
c4f62b38 (&rdev->wiphy.mtx){+.+.}-{3:3}, at: _mwifiex_fw_dpc+0x248/0x49c 
[mwifiex]

other info that might help us debug this:
  Possible unsafe locking scenario:

        CPU0
        ----
   lock(&rdev->wiphy.mtx);
   lock(&rdev->wiphy.mtx);

  *** DEADLOCK ***

  May be due to missing lock nesting notation

4 locks held by kworker/2:3/108:
  #0: c1c066a8 ((wq_completion)events){+.+.}-{0:0}, at: 
process_one_work+0x24c/0x888
  #1: deccbf10 ((work_completion)(&fw_work->work)){+.+.}-{0:0}, at: 
process_one_work+0x24c/0x888
  #2: c13202dc (rtnl_mutex){+.+.}-{3:3}, at: _mwifiex_fw_dpc+0x23c/0x49c 
[mwifiex]
  #3: c4f62b38 (&rdev->wiphy.mtx){+.+.}-{3:3}, at: 
_mwifiex_fw_dpc+0x248/0x49c [mwifiex]

stack backtrace:
CPU: 2 PID: 108 Comm: kworker/2:3 Not tainted 
5.11.0-rc4-00535-g791daf8fc49a #2336
Hardware name: Samsung Exynos (Flattened Device Tree)
Workqueue: events request_firmware_work_func
[<c01116e8>] (unwind_backtrace) from [<c010cf58>] (show_stack+0x10/0x14)
[<c010cf58>] (show_stack) from [<c0b3ad3c>] (dump_stack+0xa4/0xc4)
[<c0b3ad3c>] (dump_stack) from [<c0195fd8>] (__lock_acquire+0xc20/0x31cc)
[<c0195fd8>] (__lock_acquire) from [<c019923c>] (lock_acquire+0x2e4/0x5dc)
[<c019923c>] (lock_acquire) from [<c0b4217c>] (__mutex_lock+0xa4/0xb60)
[<c0b4217c>] (__mutex_lock) from [<c0b42c54>] (mutex_lock_nested+0x1c/0x24)
[<c0b42c54>] (mutex_lock_nested) from [<bf1c87f8>] 
(_mwifiex_fw_dpc+0x2c0/0x49c [mwifiex])
[<bf1c87f8>] (_mwifiex_fw_dpc [mwifiex]) from [<c06bfd18>] 
(request_firmware_work_func+0x58/0x94)
[<c06bfd18>] (request_firmware_work_func) from [<c0149d48>] 
(process_one_work+0x30c/0x888)
[<c0149d48>] (process_one_work) from [<c014a31c>] (worker_thread+0x58/0x594)
[<c014a31c>] (worker_thread) from [<c0151284>] (kthread+0x154/0x19c)
[<c0151284>] (kthread) from [<c010011c>] (ret_from_fork+0x14/0x38)
Exception stack(0xdeccbfb0 to 0xdeccbff8)
...

 > ...

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help