Thread (18 messages) 18 messages, 3 authors, 2025-08-28

Re: [PATCH] Revert "virtio_pci: Support surprise removal of virtio pci device"

From: "Michael S. Tsirkin" <mst@redhat.com>
Date: 2025-08-22 10:21:59
Also in: stable

On Fri, Aug 22, 2025 at 12:17:06PM +0300, Parav Pandit wrote:
This reverts commit 43bb40c5b926 ("virtio_pci: Support surprise removal of virtio pci device").

Virtio drivers and PCI devices have never fully supported true
surprise (aka hot unplug) removal. Drivers historically continued
processing and waiting for pending I/O and even continued synchronous
device reset during surprise removal. Devices have also continued
completing I/Os, doing DMA and allowing device reset after surprise
removal to support such drivers.

Supporting it correctly would require a new device capability
If a device is removed, it is removed. Windows drivers supported
this since forever and it's just a Linux bug that it does not
handle all the cases. This is not something you can handle
with a capability.




and
driver negotiation in the virtio specification to safely stop
I/O and free queue memory. Failure to do so either breaks all the
existing drivers with call trace listed in the commit or crashes the
host on continuing the DMA.
If the device is gone, then DMA does not continue.

IIUC what is going on for you, is that you have developed a surprise
removal emulation that pretends to remove the device but
actually the device is doing DMA. So of course things break then.
Hence, until such specification and devices
are invented, restore the previous behavior of treating surprise
removal as graceful removal to avoid regressions and maintain system
stability same as before the
commit 43bb40c5b926 ("virtio_pci: Support surprise removal of virtio pci device").

As explained above, previous analysis of solving this only in driver
was incomplete and non-reliable at [1] and at [2]; Hence reverting commit
43bb40c5b926 ("virtio_pci: Support surprise removal of virtio pci device")
is still the best stand to restore failures of virtio net and
block devices.

[1] https://lore.kernel.org/virtualization/CY8PR12MB719506CC5613EB100BC6C638DCBD2@CY8PR12MB7195.namprd12.prod.outlook.com/#t (local)

I can only repeat what I said then, this is not how we do kernel
development.
[2] https://lore.kernel.org/virtualization/20250602024358.57114-1-parav@nvidia.com/ (local)
What was missing here, is handling corner cases. So let us please 
try to handle them.

Here is how I would try to do it:

- add a new driver callback
- start a periodic timer task in virtio core on remove
- in the timer, probe that the device is still present.
  if not, invoke a driver callback
- cancel the task on device reset

If you do not have the time, let me know and I will try to look into it.
quoted hunk ↗ jump to hunk
Fixes: 43bb40c5b926 ("virtio_pci: Support surprise removal of virtio pci device")
Cc: stable@vger.kernel.org
Reported-by: lirongqing@baidu.com
Closes: https://lore.kernel.org/virtualization/c45dd68698cd47238c55fb73ca9b4741@baidu.com/ (local)
Signed-off-by: Parav Pandit <redacted>
---
 drivers/virtio/virtio_pci_common.c | 7 -------
 1 file changed, 7 deletions(-)
diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
index d6d79af44569..dba5eb2eaff9 100644
--- a/drivers/virtio/virtio_pci_common.c
+++ b/drivers/virtio/virtio_pci_common.c
@@ -747,13 +747,6 @@ static void virtio_pci_remove(struct pci_dev *pci_dev)
 	struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev);
 	struct device *dev = get_device(&vp_dev->vdev.dev);
 
-	/*
-	 * Device is marked broken on surprise removal so that virtio upper
-	 * layers can abort any ongoing operation.
-	 */
-	if (!pci_device_is_present(pci_dev))
-		virtio_break_device(&vp_dev->vdev);
-
 	pci_disable_sriov(pci_dev);
 
 	unregister_virtio_device(&vp_dev->vdev);
-- 
2.26.2
  
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help