Thread (6 messages) 6 messages, 2 authors, 2021-09-06

Re: [RFC PATCH 0/1] virtio: false unhandled irqs from vring_interrupt()

From: Stefan Hajnoczi <stefanha@redhat.com>
Date: 2021-08-24 16:36:51
Also in: lkml

On Tue, Aug 24, 2021 at 07:31:29AM -0400, Michael S. Tsirkin wrote:
On Tue, Aug 24, 2021 at 11:59:43AM +0100, Stefan Hajnoczi wrote:
quoted
While investigating an unhandled irq from vring_interrupt() with virtiofs I
stumbled onto a possible race that also affects virtio_gpu. This theory is
based on code inspection and hopefully you can point out something that makes
this a non-issue in practice :).

The vring_interrupt() function returns IRQ_NONE when an MSI-X interrupt is
taken with no used (completed) buffers in the virtqueue. The kernel disables
the irq and the driver is no longer receives irqs when this happens:

  irq 77: nobody cared (try booting with the "irqpoll" option)
  ...
  handlers:
  [<00000000a40a49bb>] vring_interrupt
  Disabling IRQ #77

Consider the following:

1. An virtiofs irq is handled and the virtio_fs_requests_done_work() work
   function is scheduled to dequeue used buffers:
   vring_interrupt() -> virtio_fs_vq_done() -> schedule_work()

2. The device adds more used requests and just before...

3. ...virtio_fs_requests_done_work() empties the virtqueue with
   virtqueue_get_buf().

4. The device raises the irq and vring_interrupt() is called after
   virtio_fs_requests_done_work emptied the virtqueue:

   irqreturn_t vring_interrupt(int irq, void *_vq)
   {
       struct vring_virtqueue *vq = to_vvq(_vq);

       if (!more_used(vq)) {
           pr_debug("virtqueue interrupt with no work for %p\n", vq);
           return IRQ_NONE;
           ^^^^^^^^^^^^^^^^

I have included a patch that switches virtiofs from spin_lock() to
spin_lock_irqsave() to prevent vring_interrupt() from running while the
virtqueue is processed from a work function.

virtio_gpu has a similar case where virtio_gpu_dequeue_ctrl_func() and
virtio_gpu_dequeue_cursor_func() work functions only use spin_lock().
I think this can result in the same false unhandled irq problem as virtiofs.

This race condition could in theory affect all drivers. The VIRTIO
specification says:

  Neither of these notification suppression methods are reliable, as they are
  not synchronized with the device, but they serve as useful optimizations.

If virtqueue_disable_cb() is just a hint and might not disable virtqueue irqs
then virtio_net and other drivers have a problem because because an irq could
be raised while the driver is dequeuing used buffers. I think we haven't seen
this because software VIRTIO devices honor virtqueue_disable_cb(). Hardware
devices might cache the value and not disable notifications for some time...

Have I missed something?

The virtiofs patch I attached is being stress tested to see if the unhandled
irqs still occur.

Stefan Hajnoczi (1):
  fuse: disable local irqs when processing vq completions

 fs/fuse/virtio_fs.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)
Fundamentally it is not a problem to have an unhandled IRQ
once in a while. It's only a problem if this happens time
after time.


Does the kernel you are testing include
commit 8d622d21d24803408b256d96463eac4574dcf067
Author: Michael S. Tsirkin [off-list ref]
Date:   Tue Apr 13 01:19:16 2021 -0400

    virtio: fix up virtio_disable_cb

?

If not it's worth checking whether the latest kernel still
has the issue.

Note cherry picking that commit isn't trivial there are
a bunch of dependencies.

If you want to try on an old kernel, disable event index.
Thanks for pointing out this commit. My kernel does not have it. The
device is using the split vring layout (probably with EVENT_IDX) so
virtqueue_disable_cb() has no effect.

kernel/irq/spurious.c:note_interrupt() disables the irq after 99.9k
unhandled irqs. It's surprising that virtiofs is hitting this limit
through the scenario I described, but maybe.

I'll try different kernel versions and/or disable EVENT_IDX as you
suggested. Thanks!

Stefan
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help