Re: [PATCH V2] virtio-net: fix the race between refill work and close
From: Jason Wang <jasowang@redhat.com>
Date: 2022-06-30 06:15:54
Also in:
lkml, virtualization
On Thu, Jun 30, 2022 at 2:07 PM Jason Wang [off-list ref] wrote:
On Thu, Jun 30, 2022 at 10:51 AM Jakub Kicinski [off-list ref] wrote:quoted
On Thu, 30 Jun 2022 10:08:04 +0800 Jason Wang wrote:quoted
+static void enable_refill_work(struct virtnet_info *vi) +{ + spin_lock(&vi->refill_lock); + vi->refill_work_enabled = true; + spin_unlock(&vi->refill_lock); +} + +static void disable_refill_work(struct virtnet_info *vi) +{ + spin_lock(&vi->refill_lock); + vi->refill_work_enabled = false; + spin_unlock(&vi->refill_lock); +} + static void virtqueue_napi_schedule(struct napi_struct *napi, struct virtqueue *vq) {@@ -1527,8 +1547,12 @@ static int virtnet_receive(struct receive_queue *rq, int budget, } if (rq->vq->num_free > min((unsigned int)budget, virtqueue_get_vring_size(rq->vq)) / 2) { - if (!try_fill_recv(vi, rq, GFP_ATOMIC)) - schedule_delayed_work(&vi->refill, 0); + if (!try_fill_recv(vi, rq, GFP_ATOMIC)) { + spin_lock(&vi->refill_lock); + if (vi->refill_work_enabled) + schedule_delayed_work(&vi->refill, 0); + spin_unlock(&vi->refill_lock);Are you sure you can use the basic spin_lock() flavor in all cases? Isn't the disable/enable called from a different context than this thing here?This function will only be called in bh so it's safe.
Ok, so it looks like we should use the bh variant in close. Otherwise we may have a deadlock. Will fix it. Thanks
quoted
The entire delayed work construct seems a little risky because the work may go to sleep after disabling napi, causing large latency spikes.Yes, but it only happens on OOM.quoted
I guess you must have a good reason no to simply reschedule the NAPI and keep retrying with GFP_ATOMIC...Less pressure on the memory allocator on OOM probably, but it looks like an independent issue that might be optimized in the future.quoted
Please add the target tree name to the subject.Ok Thanksquoted