RE: [EXTERNAL] [PATCH 1/6] PCI: hv: fix a race condition bug in hv_pci_query_relations()
From: Saurabh Singh Sengar <hidden>
Date: 2023-03-28 05:29:19
Also in:
linux-pci, linux-rdma, lkml, netdev
quoted hunk ↗ jump to hunk
-----Original Message----- From: Dexuan Cui <decui@microsoft.com> Sent: Tuesday, March 28, 2023 10:21 AM To: bhelgaas@google.com; davem@davemloft.net; Dexuan Cui [off-list ref]; edumazet@google.com; Haiyang Zhang [off-list ref]; Jake Oshins [off-list ref]; kuba@kernel.org; kw@linux.com; KY Srinivasan [off-list ref]; leon@kernel.org; linux-pci@vger.kernel.org; lpieralisi@kernel.org; Michael Kelley (LINUX) [off-list ref]; pabeni@redhat.com; robh@kernel.org; saeedm@nvidia.com; wei.liu@kernel.org; Long Li [off-list ref]; boqun.feng@gmail.com Cc: linux-hyperv@vger.kernel.org; linux-kernel@vger.kernel.org; linux- rdma@vger.kernel.org; netdev@vger.kernel.org Subject: [EXTERNAL] [PATCH 1/6] PCI: hv: fix a race condition bug in hv_pci_query_relations() Fix the longstanding race between hv_pci_query_relations() and survey_child_resources() by flushing the workqueue before we exit from hv_pci_query_relations(). Fixes: 4daace0d8ce8 ("PCI: hv: Add paravirtual PCI front-end for Microsoft Hyper-V VMs") Signed-off-by: Dexuan Cui <decui@microsoft.com> --- drivers/pci/controller/pci-hyperv.c | 13 +++++++++++++ 1 file changed, 13 insertions(+) With the below debug code:@@ -2103,6 +2103,8 @@ static void survey_child_resources(structhv_pcibus_device *hbus) } spin_unlock_irqrestore(&hbus->device_list_lock, flags); + ssleep(15); + printk("%s: completing %px\n", __func__, event); complete(event); }@@ -3305,8 +3307,12 @@ static int hv_pci_query_relations(struct hv_device*hdev) ret = vmbus_sendpacket(hdev->channel, &message, sizeof(message), 0, VM_PKT_DATA_INBAND, 0); - if (!ret) + if (!ret) { + ssleep(10); // unassign the PCI device on the host during the 10s ret = wait_for_response(hdev, &comp); + printk("%s: comp=%px is becoming invalid! ret=%d\n", + __func__, &comp, ret); + } return ret; }@@ -3635,6 +3641,8 @@ static int hv_pci_probe(struct hv_device *hdev, retry: ret = hv_pci_query_relations(hdev); + printk("hv_pci_query_relations() exited\n");
Can we use pr_* or the appropriate KERN_<LEVEL> in all the printk(s).
quoted hunk ↗ jump to hunk
+ if (ret) goto free_irq_domain; I'm able to repro the below hang issue: [ 74.544744] hv_pci b92a0085-468b-407a-a88a-d33fac8edc75: PCI VMBus probing: Using version 0x10004 [ 76.886944] hv_netvsc 818fe754-b912-4445-af51-1f584812e3c9 eth0: VF slot 1 removed [ 84.788266] hv_pci b92a0085-468b-407a-a88a-d33fac8edc75: The device is gone. [ 84.792586] hv_pci_query_relations: comp=ffffa7504012fb58 is becoming invalid! ret=-19 [ 84.797505] hv_pci_query_relations() exited [ 89.652268] survey_child_resources: completing ffffa7504012fb58 [ 150.392242] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: [ 150.398447] rcu: 15-...0: (2 ticks this GP) idle=867c/1/0x4000000000000000 softirq=947/947 fqs=5234 [ 150.405851] rcu: (detected by 14, t=15004 jiffies, g=2553, q=4833 ncpus=16) [ 150.410870] Sending NMI from CPU 14 to CPUs 15: [ 150.414836] NMI backtrace for cpu 15 [ 150.414840] CPU: 15 PID: 10 Comm: kworker/u32:0 Tainted: G W E 6.3.0-rc3-decui-dirty #34 ... [ 150.414849] Workqueue: hv_pci_468b pci_devices_present_work [pci_hyperv] [ 150.414866] RIP: 0010:__pv_queued_spin_lock_slowpath+0x10f/0x3c0 ... [ 150.414905] Call Trace: [ 150.414907] <TASK> [ 150.414911] _raw_spin_lock_irqsave+0x40/0x50 [ 150.414917] complete+0x1d/0x60 [ 150.414924] pci_devices_present_work+0x5dd/0x680 [pci_hyperv] [ 150.414946] process_one_work+0x21f/0x430 [ 150.414952] worker_thread+0x4a/0x3c0 With this patch, the hang issue goes away: [ 186.143612] hv_pci b92a0085-468b-407a-a88a-d33fac8edc75: The device is gone. [ 186.148034] hv_pci_query_relations: comp=ffffa7cfd0aa3b50 is becoming invalid! ret=-19 [ 191.263611] survey_child_resources: completing ffffa7cfd0aa3b50 [ 191.267732] hv_pci_query_relations() exiteddiff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c index f33370b75628..b82c7cde19e6 100644--- a/drivers/pci/controller/pci-hyperv.c +++ b/drivers/pci/controller/pci-hyperv.c@@ -3308,6 +3308,19 @@ static int hv_pci_query_relations(struct hv_device*hdev) if (!ret) ret = wait_for_response(hdev, &comp); + /* + * In the case of fast device addition/removal, it's possible that + * vmbus_sendpacket() or wait_for_response() returns -ENODEV but we + * already got a PCI_BUS_RELATIONS* message from the host and the + * channel callback already scheduled a work to hbus->wq, which can be + * running survey_child_resources() -> complete(&hbus-quoted
survey_event),+ * even after hv_pci_query_relations() exits and the stack variable + * 'comp' is no longer valid. This can cause a strange hang issue + * or sometimes a page fault. Flush hbus->wq before we exit from + * hv_pci_query_relations() to avoid the issues. + */ + flush_workqueue(hbus->wq); + return ret; } -- 2.25.1