Re: [PATCH] [RFC] virtio: Limit the retries on a virtio device reset
From: Cornelia Huck <cohuck@redhat.com>
Date: 2017-08-24 11:07:46
On Wed, 23 Aug 2017 18:33:02 +0200 Pierre Morel [off-list ref] wrote:
Reseting a device can sometime fail, even a virtual device. If the device is not reseted after a while the driver should abandon the retries. This is the change proposed for the modern virtio_pci. More generally, when this happens,the virtio driver can set the VIRTIO_CONFIG_S_FAILED status flag to advertise the caller. The virtio core can test if the reset was succesful by testing this flag after a reset. This behavior is backward compatible with existing drivers. This behavior seems to me compatible with Virtio-1.0 specifications, Chapters 2.1 Device Status Field. There I definitively need your opinion: Is it right?
Will have to double check with the spec.
This patch also lead to another question: do we care if a device provided by the hypervisor is buggy?
Getting into a hang because of a broken device is not nice, but I'm not sure we need to plan for this. Have you seen this in the wild?
quoted hunk ↗ jump to hunk
Signed-off-by: Pierre Morel <redacted> --- drivers/virtio/virtio.c | 4 ++++ drivers/virtio/virtio_pci_modern.c | 11 ++++++++++- 2 files changed, 14 insertions(+), 1 deletion(-)diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c index 48230a5..6255dc4 100644 --- a/drivers/virtio/virtio.c +++ b/drivers/virtio/virtio.c@@ -324,6 +324,8 @@ int register_virtio_device(struct virtio_device *dev) /* We always start by resetting the device, in case a previous * driver messed it up. This also tests that code path a little. */ dev->config->reset(dev); + if (dev->config->get_status(dev) & VIRTIO_CONFIG_S_FAILED) + return -EIO; /* Acknowledge that we've seen the device. */ virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);@@ -373,6 +375,8 @@ int virtio_device_restore(struct virtio_device *dev) /* We always start by resetting the device, in case a previous * driver messed it up. */ dev->config->reset(dev); + if (dev->config->get_status(dev) & VIRTIO_CONFIG_S_FAILED) + return -EIO;
virtio-ccw prior to rev 2 won't ever see this (as the read command did not exist then), but this is not really a problem.
quoted hunk ↗ jump to hunk
/* Acknowledge that we've seen the device. */ virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c index 2555d80..bfc5fc1 100644 --- a/drivers/virtio/virtio_pci_modern.c +++ b/drivers/virtio/virtio_pci_modern.c@@ -270,6 +270,7 @@ static void vp_set_status(struct virtio_device *vdev, u8 status) static void vp_reset(struct virtio_device *vdev) { struct virtio_pci_device *vp_dev = to_vp_device(vdev); + int retry_count = 10;
When you're touching this anyway, it would be a good time to add an extra blank line :)
quoted hunk ↗ jump to hunk
/* 0 status means a reset. */ vp_iowrite8(0, &vp_dev->common->device_status); /* After writing 0 to device_status, the driver MUST wait for a read of@@ -277,8 +278,16 @@ static void vp_reset(struct virtio_device *vdev) * This will flush out the status write, and flush in device writes, * including MSI-X interrupts, if any. */ - while (vp_ioread8(&vp_dev->common->device_status)) + while (vp_ioread8(&vp_dev->common->device_status) && retry_count--) msleep(1); + /* If the read did not return 0 before the timeout consider that + * the device failed. + */ + if (retry_count <= 0) { + virtio_add_status(vdev, VIRTIO_CONFIG_S_FAILED); + return; + } + virtio_add_status(vdev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
Adding ACK here seems wrong?
/* Flush pending VQ/configuration callbacks. */ vp_synchronize_vectors(vdev); }