Re: [PATCH] usb: hub: Fix usb enumeration issue due to address0 race
From: Marek Szyprowski <m.szyprowski@samsung.com>
Date: 2021-11-18 11:21:29
Also in:
linux-usb
Hi, On 15.11.2021 23:16, Mathias Nyman wrote:
xHC hardware can only have one slot in default state with address 0 waiting for a unique address at a time, otherwise "undefined behavior may occur" according to xhci spec 5.4.3.4 The address0_mutex exists to prevent this across both xhci roothubs. If hub_port_init() fails, it may unlock the mutex and exit with a xhci slot in default state. If the other xhci roothub calls hub_port_init() at this point we end up with two slots in default state. Make sure the address0_mutex protects the slot default state across hub_port_init() retries, until slot is addressed or disabled. Note, one known minor case is not fixed by this patch. If device needs to be reset during resume, but fails all hub_port_init() retries in usb_reset_and_verify_device(), then it's possible the slot is still left in default state when address0_mutex is unlocked. Cc: <redacted> Signed-off-by: Mathias Nyman <redacted>
This patch landed in linux next-20211118 as commit 6ae6dc22d2d1 ("usb:
hub: Fix usb enumeration issue due to address0 race"). On my test
systems it triggers the following deplock warning during system
suspend/resume cycle:
======================================================
WARNING: possible circular locking dependency detected
5.16.0-rc1-00014-g6ae6dc22d2d1 #4126 Not tainted
------------------------------------------------------
kworker/u16:8/738 is trying to acquire lock:
cf81f738 (hcd->address0_mutex){+.+.}-{3:3}, at:
usb_reset_and_verify_device+0xe8/0x3e4
but task is already holding lock:
cf80ab3c (&port_dev->status_lock){+.+.}-{3:3}, at:
usb_port_resume+0xa0/0x7e8
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&port_dev->status_lock){+.+.}-{3:3}:
mutex_lock_nested+0x1c/0x24
hub_event+0x824/0x1758
process_one_work+0x2c8/0x7ec
worker_thread+0x50/0x584
kthread+0x13c/0x19c
ret_from_fork+0x14/0x2c
0x0
-> #0 (hcd->address0_mutex){+.+.}-{3:3}:
lock_acquire+0x2a0/0x42c
__mutex_lock+0x94/0xaa8
mutex_lock_nested+0x1c/0x24
usb_reset_and_verify_device+0xe8/0x3e4
usb_port_resume+0x4e0/0x7e8
usb_generic_driver_resume+0x18/0x40
usb_resume_both+0x120/0x164
usb_resume+0x14/0x60
usb_dev_resume+0xc/0x10
dpm_run_callback+0x98/0x32c
device_resume+0xb4/0x258
async_resume+0x20/0x64
async_run_entry_fn+0x40/0x15c
process_one_work+0x2c8/0x7ec
worker_thread+0x50/0x584
kthread+0x13c/0x19c
ret_from_fork+0x14/0x2c
0x0
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&port_dev->status_lock);
lock(hcd->address0_mutex);
lock(&port_dev->status_lock);
lock(hcd->address0_mutex);
*** DEADLOCK ***
4 locks held by kworker/u16:8/738:
#0: c1c08ca8 ((wq_completion)events_unbound){+.+.}-{0:0}, at:
process_one_work+0x21c/0x7ec
#1: cd9cff18 ((work_completion)(&entry->work)){+.+.}-{0:0}, at:
process_one_work+0x21c/0x7ec
#2: cf810148 (&dev->mutex){....}-{3:3}, at: device_resume+0x60/0x258
#3: cf80ab3c (&port_dev->status_lock){+.+.}-{3:3}, at:
usb_port_resume+0xa0/0x7e8
stack backtrace:
CPU: 4 PID: 738 Comm: kworker/u16:8 Not tainted
5.16.0-rc1-00014-g6ae6dc22d2d1 #4126
Hardware name: Samsung Exynos (Flattened Device Tree)
Workqueue: events_unbound async_run_entry_fn
[<c01110d0>] (unwind_backtrace) from [<c010cab8>] (show_stack+0x10/0x14)
[<c010cab8>] (show_stack) from [<c0b7427c>] (dump_stack_lvl+0x58/0x70)
[<c0b7427c>] (dump_stack_lvl) from [<c0192958>]
(check_noncircular+0x144/0x15c)
[<c0192958>] (check_noncircular) from [<c0195e68>]
(__lock_acquire+0x17e4/0x3180)
[<c0195e68>] (__lock_acquire) from [<c01983ec>] (lock_acquire+0x2a0/0x42c)
[<c01983ec>] (lock_acquire) from [<c0b7ba80>] (__mutex_lock+0x94/0xaa8)
[<c0b7ba80>] (__mutex_lock) from [<c0b7c4b0>] (mutex_lock_nested+0x1c/0x24)
[<c0b7c4b0>] (mutex_lock_nested) from [<c077bf9c>]
(usb_reset_and_verify_device+0xe8/0x3e4)
[<c077bf9c>] (usb_reset_and_verify_device) from [<c077e79c>]
(usb_port_resume+0x4e0/0x7e8)
[<c077e79c>] (usb_port_resume) from [<c07941f0>]
(usb_generic_driver_resume+0x18/0x40)
[<c07941f0>] (usb_generic_driver_resume) from [<c0789a40>]
(usb_resume_both+0x120/0x164)
[<c0789a40>] (usb_resume_both) from [<c078a854>] (usb_resume+0x14/0x60)
[<c078a854>] (usb_resume) from [<c0777fa0>] (usb_dev_resume+0xc/0x10)
[<c0777fa0>] (usb_dev_resume) from [<c06ca1b8>]
(dpm_run_callback+0x98/0x32c)
[<c06ca1b8>] (dpm_run_callback) from [<c06ca500>] (device_resume+0xb4/0x258)
[<c06ca500>] (device_resume) from [<c06ca6c4>] (async_resume+0x20/0x64)
[<c06ca6c4>] (async_resume) from [<c01548bc>]
(async_run_entry_fn+0x40/0x15c)
[<c01548bc>] (async_run_entry_fn) from [<c0148a68>]
(process_one_work+0x2c8/0x7ec)
[<c0148a68>] (process_one_work) from [<c0148fdc>] (worker_thread+0x50/0x584)
[<c0148fdc>] (worker_thread) from [<c0151274>] (kthread+0x13c/0x19c)
[<c0151274>] (kthread) from [<c0100108>] (ret_from_fork+0x14/0x2c)
Exception stack(0xcd9cffb0 to 0xcd9cfff8)
ffa0: 00000000 00000000 00000000
00000000
ffc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000
ffe0: 00000000 00000000 00000000 00000000 00000013 00000000
usb 1-1: reset high-speed USB device number 2 using exynos-ehci
usb 1-1.1: reset high-speed USB device number 3 using exynos-ehci
usb usb3: root hub lost power or was reset
usb usb4: root hub lost power or was reset
s3c-rtc 101e0000.rtc: rtc disabled, re-enabling
smsc95xx 1-1.1:1.0 eth0: Link is Down
OOM killer enabled.
Restarting tasks ... done.
PM: suspend exit
Reverting it on top of next-20211118 fixes/hides this warning. Let me
know if I can help somehow fixing this issue.
quoted hunk ↗ jump to hunk
--- drivers/usb/core/hub.c | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-)diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c index 86658a81d284..00c3506324e4 100644 --- a/drivers/usb/core/hub.c +++ b/drivers/usb/core/hub.c@@ -4700,8 +4700,6 @@ hub_port_init(struct usb_hub *hub, struct usb_device *udev, int port1, if (oldspeed == USB_SPEED_LOW) delay = HUB_LONG_RESET_TIME; - mutex_lock(hcd->address0_mutex); - /* Reset the device; full speed may morph to high speed */ /* FIXME a USB 2.0 device may morph into SuperSpeed on reset. */ retval = hub_port_reset(hub, port1, udev, delay, false);@@ -5016,7 +5014,6 @@ hub_port_init(struct usb_hub *hub, struct usb_device *udev, int port1, hub_port_disable(hub, port1, 0); update_devnum(udev, devnum); /* for disconnect processing */ } - mutex_unlock(hcd->address0_mutex); return retval; }@@ -5246,6 +5243,9 @@ static void hub_port_connect(struct usb_hub *hub, int port1, u16 portstatus, unit_load = 100; status = 0; + + mutex_lock(hcd->address0_mutex); + for (i = 0; i < PORT_INIT_TRIES; i++) { /* reallocate for each attempt, since references@@ -5282,6 +5282,8 @@ static void hub_port_connect(struct usb_hub *hub, int port1, u16 portstatus, if (status < 0) goto loop; + mutex_unlock(hcd->address0_mutex); + if (udev->quirks & USB_QUIRK_DELAY_INIT) msleep(2000);@@ -5370,6 +5372,7 @@ static void hub_port_connect(struct usb_hub *hub, int port1, u16 portstatus, loop_disable: hub_port_disable(hub, port1, 1); + mutex_lock(hcd->address0_mutex); loop: usb_ep0_reinit(udev); release_devnum(udev);@@ -5396,6 +5399,8 @@ static void hub_port_connect(struct usb_hub *hub, int port1, u16 portstatus, } done: + mutex_unlock(hcd->address0_mutex); + hub_port_disable(hub, port1, 1); if (hcd->driver->relinquish_port && !hub->hdev->parent) { if (status != -ENOTCONN && status != -ENODEV)@@ -5915,6 +5920,8 @@ static int usb_reset_and_verify_device(struct usb_device *udev) bos = udev->bos; udev->bos = NULL; + mutex_lock(hcd->address0_mutex); + for (i = 0; i < PORT_INIT_TRIES; ++i) { /* ep0 maxpacket size may change; let the HCD know about it.@@ -5924,6 +5931,7 @@ static int usb_reset_and_verify_device(struct usb_device *udev) if (ret >= 0 || ret == -ENOTCONN || ret == -ENODEV) break; } + mutex_unlock(hcd->address0_mutex); if (ret < 0) goto re_enumerate;
Best regards -- Marek Szyprowski, PhD Samsung R&D Institute Poland