Re: [PATCH RFC 2/2] virtio_ring: support packed ring
From: Jason Wang <jasowang@redhat.com>
Date: 2018-03-16 08:34:43
Also in:
lkml, virtualization
On 2018年03月16日 15:40, Tiwei Bie wrote:
On Fri, Mar 16, 2018 at 02:44:12PM +0800, Jason Wang wrote:quoted
On 2018年03月16日 14:10, Tiwei Bie wrote:quoted
On Fri, Mar 16, 2018 at 12:03:25PM +0800, Jason Wang wrote:quoted
On 2018年02月23日 19:18, Tiwei Bie wrote:quoted
Signed-off-by: Tiwei Bie <redacted> --- drivers/virtio/virtio_ring.c | 699 +++++++++++++++++++++++++++++++++++++------ include/linux/virtio_ring.h | 8 +- 2 files changed, 618 insertions(+), 89 deletions(-)[...]quoted
quoted
quoted
quoted
cpu_addr, size, direction); } -static void vring_unmap_one(const struct vring_virtqueue *vq, - struct vring_desc *desc) +static void vring_unmap_one(const struct vring_virtqueue *vq, void *_desc) {Let's split the helpers to packed/split version like other helpers? (Consider the caller has already known the type of vq).Okay.[...]quoted
quoted
quoted
+ desc[i].flags = flags; + + desc[i].addr = cpu_to_virtio64(_vq->vdev, addr); + desc[i].len = cpu_to_virtio32(_vq->vdev, sg->length); + desc[i].id = cpu_to_virtio32(_vq->vdev, head);If it's a part of chain, we only need to do this for last buffer I think.I'm not sure I've got your point about the "last buffer". But, yes, id just needs to be set for the last desc.Right, I think I meant "last descriptor" :)quoted
quoted
quoted
+ prev = i; + i++;It looks to me prev is always i - 1?No. prev will be (vq->vring_packed.num - 1) when i becomes 0.Right, so prev = i ? i - 1 : vq->vring_packed.num - 1.Yes, i wraps together with vq->wrap_counter in following code:quoted
quoted
quoted
quoted
+ if (!indirect && i >= vq->vring_packed.num) { + i = 0; + vq->wrap_counter ^= 1; + }quoted
quoted
quoted
quoted
+ } + } + for (; n < (out_sgs + in_sgs); n++) { + for (sg = sgs[n]; sg; sg = sg_next(sg)) { + dma_addr_t addr = vring_map_one_sg(vq, sg, DMA_FROM_DEVICE); + if (vring_mapping_error(vq, addr)) + goto unmap_release; + + flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_NEXT | + VRING_DESC_F_WRITE | + VRING_DESC_F_AVAIL(vq->wrap_counter) | + VRING_DESC_F_USED(!vq->wrap_counter)); + if (!indirect && i == head) + head_flags = flags; + else + desc[i].flags = flags; + + desc[i].addr = cpu_to_virtio64(_vq->vdev, addr); + desc[i].len = cpu_to_virtio32(_vq->vdev, sg->length); + desc[i].id = cpu_to_virtio32(_vq->vdev, head); + prev = i; + i++; + if (!indirect && i >= vq->vring_packed.num) { + i = 0; + vq->wrap_counter ^= 1; + } + } + } + /* Last one doesn't continue. */ + if (!indirect && (head + 1) % vq->vring_packed.num == i) + head_flags &= cpu_to_virtio16(_vq->vdev, ~VRING_DESC_F_NEXT);I can't get the why we need this here.If only one desc is used, we will need to clear the VRING_DESC_F_NEXT flag from the head_flags.Yes, I meant why following desc[prev].flags won't work for this?Because the update of desc[head].flags (in above case, prev == head) has been delayed. The flags is saved in head_flags.
Ok, but let's try to avoid modular here e.g tracking the number of sgs in a counter. And I see lots of duplication in the above two loops, I believe we can unify them with a a single loop. the only difference is dma direction and write flag.
quoted
quoted
quoted
quoted
+ else + desc[prev].flags &= cpu_to_virtio16(_vq->vdev, ~VRING_DESC_F_NEXT); + + if (indirect) { + /* FIXME: to be implemented */ + + /* Now that the indirect table is filled in, map it. */ + dma_addr_t addr = vring_map_single( + vq, desc, total_sg * sizeof(struct vring_packed_desc), + DMA_TO_DEVICE); + if (vring_mapping_error(vq, addr)) + goto unmap_release; + + head_flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_INDIRECT | + VRING_DESC_F_AVAIL(wrap_counter) | + VRING_DESC_F_USED(!wrap_counter)); + vq->vring_packed.desc[head].addr = cpu_to_virtio64(_vq->vdev, addr); + vq->vring_packed.desc[head].len = cpu_to_virtio32(_vq->vdev, + total_sg * sizeof(struct vring_packed_desc)); + vq->vring_packed.desc[head].id = cpu_to_virtio32(_vq->vdev, head); + } + + /* We're using some buffers from the free list. */ + vq->vq.num_free -= descs_used; + + /* Update free pointer */ + if (indirect) { + n = head + 1; + if (n >= vq->vring_packed.num) { + n = 0; + vq->wrap_counter ^= 1; + } + vq->free_head = n;detach_buf_packed() does not even touch free_head here, so need to explain its meaning for packed ring.Above code is for indirect support which isn't really implemented in this patch yet. For your question, free_head stores the index of the next avail desc. I'll add a comment for it or move it to union and give it a better name in next version.Yes, something like avail_idx might be better.quoted
quoted
quoted
+ } else + vq->free_head = i;ID is only valid in the last descriptor in the list, so head + 1 should be ok too?I don't really get your point. The vq->free_head stores the index of the next avail desc.I think I get your idea now, free_head has two meanings: - next avail index - buffer idIn my design, free_head is just the index of the next avail desc. Driver can set anything to buffer ID.
Then you need another method to track id to context e.g hashing.
And in my design, I save desc index in buffer ID. I'll add comments for them.quoted
If I'm correct, let's better add a comment for this.quoted
quoted
quoted
+ + /* Store token and indirect buffer state. */ + vq->desc_state[head].num = descs_used; + vq->desc_state[head].data = data; + if (indirect) + vq->desc_state[head].indir_desc = desc; + else + vq->desc_state[head].indir_desc = ctx; + + virtio_wmb(vq->weak_barriers);Let's add a comment to explain the barrier here.Okay.quoted
quoted
+ vq->vring_packed.desc[head].flags = head_flags; + vq->num_added++; + + pr_debug("Added buffer head %i to %p\n", head, vq); + END_USE(vq); + + return 0; + +unmap_release: + err_idx = i; + i = head; + + for (n = 0; n < total_sg; n++) { + if (i == err_idx) + break; + vring_unmap_one(vq, &desc[i]); + i++; + if (!indirect && i >= vq->vring_packed.num) + i = 0; + } + + vq->wrap_counter = wrap_counter; + + if (indirect) + kfree(desc); + + END_USE(vq); + return -EIO; +}[...]quoted
quoted
quoted
quoted
@@ -1096,17 +1599,21 @@ struct virtqueue *vring_create_virtqueue( if (!queue) { /* Try to get a single page. You are my only hope! */ - queue = vring_alloc_queue(vdev, vring_size(num, vring_align), + queue = vring_alloc_queue(vdev, __vring_size(num, vring_align, + packed), &dma_addr, GFP_KERNEL|__GFP_ZERO); } if (!queue) return NULL; - queue_size_in_bytes = vring_size(num, vring_align); - vring_init(&vring, num, queue, vring_align); + queue_size_in_bytes = __vring_size(num, vring_align, packed); + if (packed) + vring_packed_init(&vring.vring_packed, num, queue, vring_align); + else + vring_init(&vring.vring_split, num, queue, vring_align);Let's rename vring_init to vring_init_split() like other helpers?The vring_init() is a public API in include/uapi/linux/virtio_ring.h. I don't think we can rename it.I see, then this need more thoughts to unify the API.My thought is to keep the old API as is, and introduce new types and helpers for packed ring.
I admit it's not a fault of this patch. But we'd better think of this in the future, consider we may have new kinds of ring.
More details can be found in this patch: https://lkml.org/lkml/2018/2/23/243 (PS. The type which has bit fields is just for reference, and will be changed in next version.) Do you have any other suggestions?
No. Thanks
Best regards, Tiwei Biequoted
quoted
quoted
quoted
- vq = __vring_new_virtqueue(index, vring, vdev, weak_barriers, context, - notify, callback, name); + vq = __vring_new_virtqueue(index, vring, packed, vdev, weak_barriers, + context, notify, callback, name); if (!vq) { vring_free_queue(vdev, queue_size_in_bytes, queue, dma_addr);[...]
_______________________________________________ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization