Re: [PATCH v9 09/11] firmware: arm_scmi: Add atomic mode support to virtio transport
From: Cristian Marussi <cristian.marussi@arm.com>
Date: 2022-01-23 22:46:02
Also in:
lkml
On Sun, Jan 23, 2022 at 05:40:08PM -0500, Michael S. Tsirkin wrote:
On Sun, Jan 23, 2022 at 08:02:54PM +0000, Cristian Marussi wrote:quoted
I was thinking...keeping the current virtqueue_poll interface, since our possible issue arises from the used_index wrapping around exactly on top of the same polled index and given that currently the API returns an unsigned "opaque" value really carrying just the 16-bit index (and possibly the wrap bit as bit15 for packed vq) that is supposed to be fed back as it is to the virtqueue_poll() function.... ...why don't we just keep an internal full fledged per-virtqueue wrap-counter and return that as the MSB 16-bit of the opaque value returned by virtqueue_prepare_enable_cb and then check it back in virtqueue_poll when the opaque is fed back ? (filtering it out from the internal helpers machinery) As in the example below the scissors. I mean if the internal wrap count is at that point different from the one provided to virtqueue_poll() via the opaque poll_idx value previously provided, certainly there is something new to fetch without even looking at the indexes: at the same time, exposing an opaque index built as (wraps << 16 | idx) implicitly 'binds' each index to a specific wrap-iteration, so they can be distiguished (..ok until the wrap-count upper 16bit wraps too....but...) I am not really extremely familiar with the internals of virtio so I could be missing something obvious...feel free to insult me :P (..and I have not made any perf measurements or consideration at this point....nor considered the redundancy of the existent packed used_wrap_counter bit...) Thanks, Cristian ----diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index 00f64f2f8b72..bda6af121cd7 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c@@ -117,6 +117,8 @@ struct vring_virtqueue { /* Last used index we've seen. */ u16 last_used_idx; + u16 wraps; + /* Hint for event idx: already triggered no need to disable. */ bool event_triggered;@@ -806,6 +808,8 @@ static void *virtqueue_get_buf_ctx_split(struct virtqueue *_vq, ret = vq->split.desc_state[i].data; detach_buf_split(vq, i, ctx); vq->last_used_idx++; + if (unlikely(!vq->last_used_idx)) + vq->wraps++;I wonder whether vq->wraps += !vq->last_used_idx; is faster or slower. No branch but OTOH a dependency.quoted
/* If we expect an interrupt for the next entry, tell host * by writing event index and flush out the write before * the read in the next get_buf call. */@@ -1508,6 +1512,7 @@ static void *virtqueue_get_buf_ctx_packed(struct virtqueue *_vq, if (unlikely(vq->last_used_idx >= vq->packed.vring.num)) { vq->last_used_idx -= vq->packed.vring.num; vq->packed.used_wrap_counter ^= 1; + vq->wraps++; } /*@@ -1744,6 +1749,7 @@ static struct virtqueue *vring_create_virtqueue_packed( vq->weak_barriers = weak_barriers; vq->broken = false; vq->last_used_idx = 0; + vq->wraps = 0; vq->event_triggered = false; vq->num_added = 0; vq->packed_ring = true;@@ -2092,13 +2098,17 @@ EXPORT_SYMBOL_GPL(virtqueue_disable_cb); */ unsigned virtqueue_enable_cb_prepare(struct virtqueue *_vq) { + unsigned last_used_idx; struct vring_virtqueue *vq = to_vvq(_vq); if (vq->event_triggered) vq->event_triggered = false; - return vq->packed_ring ? virtqueue_enable_cb_prepare_packed(_vq) : - virtqueue_enable_cb_prepare_split(_vq); + last_used_idx = vq->packed_ring ? + virtqueue_enable_cb_prepare_packed(_vq) : + virtqueue_enable_cb_prepare_split(_vq); + + return VRING_BUILD_OPAQUE(last_used_idx, vq->wraps); } EXPORT_SYMBOL_GPL(virtqueue_enable_cb_prepare);@@ -2118,9 +2128,13 @@ bool virtqueue_poll(struct virtqueue *_vq, unsigned last_used_idx) if (unlikely(vq->broken)) return false; + if (unlikely(vq->wraps != VRING_GET_WRAPS(last_used_idx))) + return true; + virtio_mb(vq->weak_barriers); - return vq->packed_ring ? virtqueue_poll_packed(_vq, last_used_idx) : - virtqueue_poll_split(_vq, last_used_idx); + return vq->packed_ring ? + virtqueue_poll_packed(_vq, VRING_GET_IDX(last_used_idx)) : + virtqueue_poll_split(_vq, VRING_GET_IDX(last_used_idx)); } EXPORT_SYMBOL_GPL(virtqueue_poll);@@ -2245,6 +2259,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index, vq->weak_barriers = weak_barriers; vq->broken = false; vq->last_used_idx = 0; + vq->wraps = 0; vq->event_triggered = false; vq->num_added = 0; vq->use_dma_api = vring_use_dma_api(vdev);diff --git a/include/uapi/linux/virtio_ring.h b/include/uapi/linux/virtio_ring.h index 476d3e5c0fe7..e6b03017ebd7 100644 --- a/include/uapi/linux/virtio_ring.h +++ b/include/uapi/linux/virtio_ring.h@@ -77,6 +77,17 @@ */ #define VRING_PACKED_EVENT_F_WRAP_CTR 15 +#define VRING_IDX_MASK GENMASK(15, 0) +#define VRING_GET_IDX(opaque) \ + ((u16)FIELD_GET(VRING_IDX_MASK, (opaque))) + +#define VRING_WRAPS_MASK GENMASK(31, 16) +#define VRING_GET_WRAPS(opaque) \ + ((u16)FIELD_GET(VRING_WRAPS_MASK, (opaque))) + +#define VRING_BUILD_OPAQUE(idx, wraps) \ + (FIELD_PREP(VRING_WRAPS_MASK, (wraps)) | ((idx) & VRING_IDX_MASK)) + /* We support indirect buffer descriptors */ #define VIRTIO_RING_F_INDIRECT_DESC 28Yea I think this patch increases the time it takes to wrap around from 2^16 to 2^32 which seems good enough. Need some comments to explain the logic. Would be interesting to see perf data.
Thanks for your feedback ! I'll try to gather some perf data around it next days. (and eventually cleanup and adding comments if it is god enough...) Thanks, Cristian _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel