Thread (29 messages) 29 messages, 6 authors, 2022-01-23

Re: [PATCH v9 09/11] firmware: arm_scmi: Add atomic mode support to virtio transport

From: Cristian Marussi <cristian.marussi@arm.com>
Date: 2022-01-23 22:46:02
Also in: lkml

On Sun, Jan 23, 2022 at 05:40:08PM -0500, Michael S. Tsirkin wrote:
On Sun, Jan 23, 2022 at 08:02:54PM +0000, Cristian Marussi wrote:
quoted
I was thinking...keeping the current virtqueue_poll interface, since our
possible issue arises from the used_index wrapping around exactly on top
of the same polled index and given that currently the API returns an
unsigned "opaque" value really carrying just the 16-bit index (and possibly
the wrap bit as bit15 for packed vq) that is supposed to be fed back as
it is to the virtqueue_poll() function....

...why don't we just keep an internal full fledged per-virtqueue wrap-counter
and return that as the MSB 16-bit of the opaque value returned by
virtqueue_prepare_enable_cb and then check it back in virtqueue_poll when the
opaque is fed back ? (filtering it out from the internal helpers machinery)

As in the example below the scissors.

I mean if the internal wrap count is at that point different from the
one provided to virtqueue_poll() via the opaque poll_idx value previously
provided, certainly there is something new to fetch without even looking
at the indexes: at the same time, exposing an opaque index built as
(wraps << 16 | idx) implicitly 'binds' each index to a specific
wrap-iteration, so they can be distiguished (..ok until the wrap-count
upper 16bit wraps too....but...)

I am not really extremely familiar with the internals of virtio so I
could be missing something obvious...feel free to insult me :P

(..and I have not made any perf measurements or consideration at this
point....nor considered the redundancy of the existent packed
used_wrap_counter bit...)

Thanks,
Cristian

----
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 00f64f2f8b72..bda6af121cd7 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -117,6 +117,8 @@ struct vring_virtqueue {
        /* Last used index we've seen. */
        u16 last_used_idx;
 
+       u16 wraps;
+
        /* Hint for event idx: already triggered no need to disable. */
        bool event_triggered;
 
@@ -806,6 +808,8 @@ static void *virtqueue_get_buf_ctx_split(struct virtqueue *_vq,
        ret = vq->split.desc_state[i].data;
        detach_buf_split(vq, i, ctx);
        vq->last_used_idx++;
+       if (unlikely(!vq->last_used_idx))
+               vq->wraps++;
I wonder whether
               vq->wraps += !vq->last_used_idx;
is faster or slower. No branch but OTOH a dependency.

quoted
        /* If we expect an interrupt for the next entry, tell host
         * by writing event index and flush out the write before
         * the read in the next get_buf call. */
@@ -1508,6 +1512,7 @@ static void *virtqueue_get_buf_ctx_packed(struct virtqueue *_vq,
        if (unlikely(vq->last_used_idx >= vq->packed.vring.num)) {
                vq->last_used_idx -= vq->packed.vring.num;
                vq->packed.used_wrap_counter ^= 1;
+               vq->wraps++;
        }
 
        /*
@@ -1744,6 +1749,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
        vq->weak_barriers = weak_barriers;
        vq->broken = false;
        vq->last_used_idx = 0;
+       vq->wraps = 0;
        vq->event_triggered = false;
        vq->num_added = 0;
        vq->packed_ring = true;
@@ -2092,13 +2098,17 @@ EXPORT_SYMBOL_GPL(virtqueue_disable_cb);
  */
 unsigned virtqueue_enable_cb_prepare(struct virtqueue *_vq)
 {
+       unsigned last_used_idx;
        struct vring_virtqueue *vq = to_vvq(_vq);
 
        if (vq->event_triggered)
                vq->event_triggered = false;
 
-       return vq->packed_ring ? virtqueue_enable_cb_prepare_packed(_vq) :
-                                virtqueue_enable_cb_prepare_split(_vq);
+       last_used_idx = vq->packed_ring ?
+                       virtqueue_enable_cb_prepare_packed(_vq) :
+                       virtqueue_enable_cb_prepare_split(_vq);
+
+       return VRING_BUILD_OPAQUE(last_used_idx, vq->wraps);
 }
 EXPORT_SYMBOL_GPL(virtqueue_enable_cb_prepare);
 
@@ -2118,9 +2128,13 @@ bool virtqueue_poll(struct virtqueue *_vq, unsigned last_used_idx)
        if (unlikely(vq->broken))
                return false;
 
+       if (unlikely(vq->wraps != VRING_GET_WRAPS(last_used_idx)))
+               return true;
+
        virtio_mb(vq->weak_barriers);
-       return vq->packed_ring ? virtqueue_poll_packed(_vq, last_used_idx) :
-                                virtqueue_poll_split(_vq, last_used_idx);
+       return vq->packed_ring ?
+               virtqueue_poll_packed(_vq, VRING_GET_IDX(last_used_idx)) :
+                       virtqueue_poll_split(_vq, VRING_GET_IDX(last_used_idx));
 }
 EXPORT_SYMBOL_GPL(virtqueue_poll);
 
@@ -2245,6 +2259,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
        vq->weak_barriers = weak_barriers;
        vq->broken = false;
        vq->last_used_idx = 0;
+       vq->wraps = 0;
        vq->event_triggered = false;
        vq->num_added = 0;
        vq->use_dma_api = vring_use_dma_api(vdev);
diff --git a/include/uapi/linux/virtio_ring.h b/include/uapi/linux/virtio_ring.h
index 476d3e5c0fe7..e6b03017ebd7 100644
--- a/include/uapi/linux/virtio_ring.h
+++ b/include/uapi/linux/virtio_ring.h
@@ -77,6 +77,17 @@
  */
 #define VRING_PACKED_EVENT_F_WRAP_CTR  15
 
+#define VRING_IDX_MASK                                 GENMASK(15, 0)
+#define VRING_GET_IDX(opaque)                          \
+       ((u16)FIELD_GET(VRING_IDX_MASK, (opaque)))
+
+#define VRING_WRAPS_MASK                               GENMASK(31, 16)
+#define VRING_GET_WRAPS(opaque)                                \
+       ((u16)FIELD_GET(VRING_WRAPS_MASK, (opaque)))
+
+#define VRING_BUILD_OPAQUE(idx, wraps)                 \
+       (FIELD_PREP(VRING_WRAPS_MASK, (wraps)) | ((idx) & VRING_IDX_MASK))
+
 /* We support indirect buffer descriptors */
 #define VIRTIO_RING_F_INDIRECT_DESC    28
Yea I think this patch increases the time it takes to wrap around from
2^16 to 2^32 which seems good enough.
Need some comments to explain the logic.
Would be interesting to see perf data.
Thanks for your feedback !

I'll try to gather some perf data around it next days.
(and eventually cleanup and adding comments if it is god enough...)

Thanks,
Cristian


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help