Re: [RFC PATCH vhost] vhost-vdpa: Fix invalid irq bypass unregister
From: Jason Wang <jasowang@redhat.com>
Date: 2024-08-08 02:57:14
Also in:
kvm, lkml, virtualization
On Tue, Aug 6, 2024 at 10:45 PM Dragos Tatulea [off-list ref] wrote:
On 06.08.24 10:18, Dragos Tatulea wrote:quoted
(Re-sending. I messed up the previous message, sorry about that.) On 06.08.24 04:57, Jason Wang wrote:quoted
On Mon, Aug 5, 2024 at 11:59 PM Dragos Tatulea [off-list ref] wrote:quoted
On 05.08.24 05:17, Jason Wang wrote:quoted
On Fri, Aug 2, 2024 at 2:51 PM Dragos Tatulea [off-list ref] wrote:quoted
On Fri, 2024-08-02 at 11:29 +0800, Jason Wang wrote:quoted
On Thu, Aug 1, 2024 at 11:38 PM Dragos Tatulea [off-list ref] wrote:quoted
The following workflow triggers the crash referenced below: 1) vhost_vdpa_unsetup_vq_irq() unregisters the irq bypass producer but the producer->token is still valid. 2) vq context gets released and reassigned to another vq.Just to make sure I understand here, which structure is referred to as "vq context" here? I guess it's not call_ctx as it is a part of the vq itself.quoted
3) That other vq registers it's producer with the same vq context pointer as token in vhost_vdpa_setup_vq_irq().Or did you mean when a single eventfd is shared among different vqs?Yes, that's what I mean: vq->call_ctx.ctx which is a eventfd_ctx. But I don't think it's shared in this case, only that the old eventfd_ctx value is lingering in producer->token. And this old eventfd_ctx is assigned now to another vq.Just to make sure I understand the issue. The eventfd_ctx should be still valid until a new VHOST_SET_VRING_CALL().I think it's not about the validity of the eventfd_ctx. More about the lingering ctx value of the producer after vhost_vdpa_unsetup_vq_irq().Probably, butquoted
That value is the eventfd ctx, but it could be anything else really...I mean we hold a refcnt of the eventfd so it should be valid until the next set_vring_call() or vhost_dev_cleanup(). But I do spot some possible issue: 1) We swap and assign new ctx in vhost_vring_ioctl(): swap(ctx, vq->call_ctx.ctx); 2) and old ctx will be put there as well: if (!IS_ERR_OR_NULL(ctx)) eventfd_ctx_put(ctx); 3) but in vdpa, we try to unregister the producer with the new token: static long vhost_vdpa_vring_ioctl(struct vhost_vdpa *v, unsigned int cmd, void __user *argp) { ... r = vhost_vring_ioctl(&v->vdev, cmd, argp); ... switch (cmd) { ... case VHOST_SET_VRING_CALL: if (vq->call_ctx.ctx) { cb.callback = vhost_vdpa_virtqueue_cb; cb.private = vq; cb.trigger = vq->call_ctx.ctx; } else { cb.callback = NULL; cb.private = NULL; cb.trigger = NULL; } ops->set_vq_cb(vdpa, idx, &cb); vhost_vdpa_setup_vq_irq(v, idx); in vhost_vdpa_setup_vq_irq() we had: irq_bypass_unregister_producer(&vq->call_ctx.producer); here the producer->token still points to the old one... Is this what you have seen?Yup. That is the issue. The unregister already happened at vhost_vdpa_unsetup_vq_irq(). So this second unregister will work on an already unregistered element due to the token still being set.quoted
quoted
quoted
I may miss something but the only way to assign exactly the same eventfd_ctx value to another vq is where the guest tries to share the MSI-X vector among virtqueues, then qemu will use a single eventfd as the callback for multiple virtqueues. If this is true:I don't think this is the case. I see the issue happening when running qemu vdpa live migration tests on the same host. From a vdpa device it's basically a device starting on a VM over and over.quoted
For bypass registering, only the first registering can succeed as the following registering will fail because the irq bypass manager already had exactly the same producer token. For registering, all unregistering can succeed: 1) the first unregistering will do the real job that unregister the token 2) the following unregistering will do nothing by iterating the producer token list without finding a match one Maybe you can show me the userspace behaviour (ioctls) when you see this?Sure, what would you need? qemu traces?Yes, that would be helpful.Will try to get them.As the traces are quite large (~5MB), I uploaded them in this location [0]. I used the following qemu traces: --trace vhost_vdpa* --trace virtio_net_handle* [0] https://drive.google.com/file/d/1XyXYyockJ_O7zMgI7vot6AxYjze9Ljju/view?usp=sharing
Thanks for doing this. So it looks not like a case of eventfd sharing: """ 153@1722953531.918958:vhost_vdpa_iotlb_begin_batch vdpa:0x7f6f9cfb5190 fd: 17 msg_type: 2 type: 5 153@1722953531.918959:vhost_vdpa_set_vring_base dev: 0x55573cc9ca70 index: 6 num: 0 svq 1 153@1722953531.918961:vhost_vdpa_set_vring_kick dev: 0x55573cc9ca70 index: 6 fd: 237 153@1722953531.918964:vhost_vdpa_set_vring_call dev: 0x55573cc9ca70 index: 6 fd: 238 153@1722953531.918978:vhost_vdpa_dma_map vdpa:0x7f6f9cfb5190 fd: 17 msg_type: 2 asid: 1 iova: 0x13000 size: 0x2000 uaddr: 0x7f6f9da1a000 perm: 0x1 type: 2 153@1722953531.918984:vhost_vdpa_dma_map vdpa:0x7f6f9cfb5190 fd: 17 msg_type: 2 asid: 1 iova: 0x15000 size: 0x1000 uaddr: 0x7f6f9da19000 perm: 0x3 type: 2 153@1722953531.918987:vhost_vdpa_set_vring_addr dev: 0x55573cc9ca70 index: 6 flags: 0x0 desc_user_addr: 0x13000 used_user_addr: 0x15000 avail_user_addr: 0x14000 log_guest_\ addr: 0x0 153@1722953531.918989:vhost_vdpa_set_vring_base dev: 0x55573cc9ca70 index: 7 num: 0 svq 1 153@1722953531.918991:vhost_vdpa_set_vring_kick dev: 0x55573cc9ca70 index: 7 fd: 239 153@1722953531.918993:vhost_vdpa_set_vring_call dev: 0x55573cc9ca70 index: 7 fd: 240 """ I think a more proper way is to unregister and clean the token before calling vhost_vring_ioctl() in the case of SET_VRING_KICK. Let me try to draft a patch and see. Thanks
Thanks, Dragos