Thread (16 messages) 16 messages, 3 authors, 2018-11-23

Re: [PATCH 8/8] aio: support for IO polling

From: Benny Halevy <hidden>
Date: 2018-11-22 00:26:09
Also in: linux-fsdevel

On Wed, 2018-11-21 at 06:26 -0700, Jens Axboe wrote:
On 11/21/18 4:12 AM, Benny Halevy wrote:
quoted
quoted
+#define AIO_POLL_STACK	8
+
+/*
+ * Process completed iocb iopoll entries, copying the result to userspace.
+ */
+static long aio_iopoll_reap(struct kioctx *ctx, struct io_event __user *evs,
+			    unsigned int *nr_events, long max)
+{
+	void *iocbs[AIO_POLL_STACK];
+	struct aio_kiocb *iocb, *n;
+	int to_free = 0, ret = 0;
To be on the safe side, how about checking that if (evs)
*nr_events < max, otherwise, return -EINVAL?
Good point, I think we should re-arrange the loop a bit to move the
check up at the top to guard for entries == max at entry. I've done
that.
quoted
quoted
+	/*
+	 * Take in a new working set from the submitted list if possible.
+	 */
+	if (!list_empty_careful(&ctx->poll_submitted)) {
+		spin_lock(&ctx->poll_lock);
+		list_splice_init(&ctx->poll_submitted, &ctx->poll_completing);
+		spin_unlock(&ctx->poll_lock);
+	}
+
+	if (list_empty(&ctx->poll_completing))
+		return 0;
Could be somewhat optimized like this:

	if (list_empty_careful(&ctx->poll_submitted))
		return 0;

	spin_lock(&ctx->poll_lock);
	list_splice_init(&ctx->poll_submitted, &ctx->poll_completing);
	spin_unlock(&ctx->poll_lock);
	if (list_empty(&ctx->poll_completing))
		return 0;

Or, possibly...
	if (list_empty_careful(&ctx->poll_submitted) ||
	    ({
		spin_lock(&ctx->poll_lock);
		list_splice_init(&ctx->poll_submitted, &ctx->poll_completing);
		spin_unlock(&ctx->poll_lock);
		list_empty(&ctx->poll_completing);
	    }))
		return 0;
I think the readability of the existing version is better.
quoted
quoted
+	/*
+	 * Check again now that we have a new batch.
+	 */
+	ret = aio_iopoll_reap(ctx, event, nr_events, max);
+	if (ret < 0)
+		return ret;
+	if (*nr_events >= min)
+		return 0;
+
+	/*
+	 * Find up to 'max_nr' worth of events to poll for, including the
What's max_nr? You mean 'max'?
It should, corrected.
quoted
quoted
+	 * events we already successfully polled
+	 */
+	polled = to_poll = 0;
+	poll_completed = atomic_read(&ctx->poll_completed);
+	list_for_each_entry(iocb, &ctx->poll_completing, ki_list) {
+		/*
+		 * Poll for needed events with wait == true, anything after
+		 * that we just check if we have more, up to max.
+		 */
+		bool wait = polled + *nr_events >= min;
+		struct kiocb *kiocb = &iocb->rw;
+
+		if (test_bit(IOCB_POLL_COMPLETED, &iocb->ki_flags))
+			break;
+		if (++to_poll + *nr_events >= max)
+			break;
+
+		polled += kiocb->ki_filp->f_op->iopoll(kiocb, wait);
Could iopoll return a negative value? (Currently not in this patchset,
but would it be possible in the future?)
That's a good point, I've added a separate check for this. Given that
it's a regular fops handler, it should be perfectly valid to return
-ERROR.
quoted
quoted
+		if (polled + *nr_events >= max)
+			break;
+		if (poll_completed != atomic_read(&ctx->poll_completed))
+			break;
+	}
+
+	ret = aio_iopoll_reap(ctx, event, nr_events, max);
+	if (ret < 0)
+		return ret;
+	if (*nr_events >= min)
+		return 0;
+	return to_poll;
What does the returned value mean?
If the intention is only to return a value greater than zero,
how about just returning to_poll > 0?
It just means that you could call us again, if > 0, and < 0 is an error
specifically.
quoted
quoted
+/*
+ * We can't just wait for polled events to come to us, we have to actively
+ * find and complete them.
+ */
+static void aio_iopoll_reap_events(struct kioctx *ctx)
+{
+	if (!(ctx->flags & IOCTX_FLAG_IOPOLL))
+		return;
+
+	while (!list_empty_careful(&ctx->poll_submitted) ||
+	       !list_empty(&ctx->poll_completing)) {
+		unsigned int nr_events = 0;
+
+		__aio_iopoll_check(ctx, NULL, &nr_events, 1, UINT_MAX);
BUG_ON(__aoi_iopoll_check() < 0) ?
Ho hum...
quoted
quoted
+	}
+}
+
+static int aio_iopoll_check(struct kioctx *ctx, long min_nr, long nr,
+			    struct io_event __user *event)
+{
+	unsigned int nr_events = 0;
+	int ret = 0;
+
+	/* * Only allow one thread polling at a time */
nit: extra '* '
Removed.

Thanks for your review!
You're welcome. Always happy to help!
Thanks for introducing this interface!
We intend to make use of it in scylla, via seastar
(See http://seastar.io/ https://github.com/scylladb/seastar)

Benny

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help