Re: [PATCH v4 6/6] io_uring: add support for zone-append

(off-list ancestor, not in this archive)
[PATCH v4 0/6] zone-append support in io-uring and aio · Kanchan Joshi <hidden> · 2020-07-24
[PATCH v4 1/6] fs: introduce FMODE_ZONE_APPEND and IOCB_ZONE_APPEND · Kanchan Joshi <hidden> · 2020-07-24
Re: [PATCH v4 1/6] fs: introduce FMODE_ZONE_APPEND and IOCB_ZONE_APPEND · Jens Axboe <axboe@kernel.dk> · 2020-07-24
Re: [PATCH v4 1/6] fs: introduce FMODE_ZONE_APPEND and IOCB_ZONE_APPEND · Christoph Hellwig <hch@infradead.org> · 2020-07-26
Re: [PATCH v4 1/6] fs: introduce FMODE_ZONE_APPEND and IOCB_ZONE_APPEND · Matthew Wilcox <willy@infradead.org> · 2020-07-28
Re: [PATCH v4 1/6] fs: introduce FMODE_ZONE_APPEND and IOCB_ZONE_APPEND · Christoph Hellwig <hch@infradead.org> · 2020-07-28
[PATCH v4 2/6] fs: change ki_complete interface to support 64bit ret2 · Kanchan Joshi <hidden> · 2020-07-24
Re: [PATCH v4 2/6] fs: change ki_complete interface to support 64bit ret2 · Christoph Hellwig <hch@infradead.org> · 2020-07-26
[PATCH v4 3/6] uio: return status with iov truncation · Kanchan Joshi <hidden> · 2020-07-24
[PATCH v4 4/6] block: add zone append handling for direct I/O path · Kanchan Joshi <hidden> · 2020-07-24
Re: [PATCH v4 4/6] block: add zone append handling for direct I/O path · Christoph Hellwig <hch@infradead.org> · 2020-07-26
[PATCH v4 5/6] block: enable zone-append for iov_iter of bvec type · Kanchan Joshi <hidden> · 2020-07-24
Re: [PATCH v4 5/6] block: enable zone-append for iov_iter of bvec type · Christoph Hellwig <hch@infradead.org> · 2020-07-26
[PATCH v4 6/6] io_uring: add support for zone-append · Kanchan Joshi <hidden> · 2020-07-24
Re: [PATCH v4 6/6] io_uring: add support for zone-append · Jens Axboe <axboe@kernel.dk> · 2020-07-24
Re: [PATCH v4 6/6] io_uring: add support for zone-append · Kanchan Joshi <hidden> · 2020-07-27
Re: [PATCH v4 6/6] io_uring: add support for zone-append · Jens Axboe <axboe@kernel.dk> · 2020-07-27
Re: [PATCH v4 6/6] io_uring: add support for zone-append · Pavel Begunkov <asml.silence@gmail.com> · 2020-07-30
Re: [PATCH v4 6/6] io_uring: add support for zone-append · Jens Axboe <axboe@kernel.dk> · 2020-07-30
Re: [PATCH v4 6/6] io_uring: add support for zone-append · Pavel Begunkov <asml.silence@gmail.com> · 2020-07-30
Re: [PATCH v4 6/6] io_uring: add support for zone-append · Jens Axboe <axboe@kernel.dk> · 2020-07-30
Re: [PATCH v4 6/6] io_uring: add support for zone-append · Pavel Begunkov <asml.silence@gmail.com> · 2020-07-30
Re: [PATCH v4 6/6] io_uring: add support for zone-append · Kanchan Joshi <hidden> · 2020-07-30
Re: [PATCH v4 6/6] io_uring: add support for zone-append · Jens Axboe <axboe@kernel.dk> · 2020-07-30
Re: [PATCH v4 6/6] io_uring: add support for zone-append · Kanchan Joshi <hidden> · 2020-07-30
Re: [PATCH v4 6/6] io_uring: add support for zone-append · Damien Le Moal <hidden> · 2020-07-31
Re: [PATCH v4 6/6] io_uring: add support for zone-append · "hch@infradead.org" <hch@infradead.org> · 2020-07-31
Re: [PATCH v4 6/6] io_uring: add support for zone-append · Damien Le Moal <hidden> · 2020-07-31
Re: [PATCH v4 6/6] io_uring: add support for zone-append · Kanchan Joshi <hidden> · 2020-07-31
Re: [PATCH v4 6/6] io_uring: add support for zone-append · Damien Le Moal <hidden> · 2020-07-31
Re: [PATCH v4 6/6] io_uring: add support for zone-append · "hch@infradead.org" <hch@infradead.org> · 2020-07-31
Re: [PATCH v4 6/6] io_uring: add support for zone-append · Damien Le Moal <hidden> · 2020-07-31
Re: [PATCH v4 6/6] io_uring: add support for zone-append · "hch@infradead.org" <hch@infradead.org> · 2020-07-31
Re: [PATCH v4 6/6] io_uring: add support for zone-append · Damien Le Moal <hidden> · 2020-07-31
Re: [PATCH v4 6/6] io_uring: add support for zone-append · "hch@infradead.org" <hch@infradead.org> · 2020-07-31
Re: [PATCH v4 6/6] io_uring: add support for zone-append · "hch@infradead.org" <hch@infradead.org> · 2020-07-31
Re: [PATCH v4 6/6] io_uring: add support for zone-append · Kanchan Joshi <hidden> · 2020-07-31
Re: [PATCH v4 6/6] io_uring: add support for zone-append · Luis Chamberlain <mcgrof@kernel.org> · 2022-03-02
Re: [PATCH v4 6/6] io_uring: add support for zone-append · Damien Le Moal <hidden> · 2020-08-05
Re: [PATCH v4 6/6] io_uring: add support for zone-append · "hch@infradead.org" <hch@infradead.org> · 2020-08-14
Re: [PATCH v4 6/6] io_uring: add support for zone-append · Damien Le Moal <hidden> · 2020-08-14
Re: [PATCH v4 6/6] io_uring: add support for zone-append · "hch@infradead.org" <hch@infradead.org> · 2020-08-14
Re: [PATCH v4 6/6] io_uring: add support for zone-append · Damien Le Moal <hidden> · 2020-08-14
Re: [PATCH v4 6/6] io_uring: add support for zone-append · Kanchan Joshi <hidden> · 2020-09-07
Re: [PATCH v4 6/6] io_uring: add support for zone-append · "hch@infradead.org" <hch@infradead.org> · 2020-09-08
Re: [PATCH v4 6/6] io_uring: add support for zone-append · Kanchan Joshi <hidden> · 2020-09-24
Re: [PATCH v4 6/6] io_uring: add support for zone-append · Damien Le Moal <hidden> · 2020-09-25
Re: [PATCH v4 6/6] io_uring: add support for zone-append · Kanchan Joshi <hidden> · 2020-09-28
Re: [PATCH v4 6/6] io_uring: add support for zone-append · Damien Le Moal <hidden> · 2020-09-29
Re: [PATCH v4 6/6] io_uring: add support for zone-append · Kanchan Joshi <hidden> · 2020-09-29
Re: [PATCH v4 6/6] io_uring: add support for zone-append · Luis Chamberlain <mcgrof@kernel.org> · 2022-03-02
Re: [PATCH v4 6/6] io_uring: add support for zone-append · Kanchan Joshi <hidden> · 2020-07-31
Re: [PATCH v4 6/6] io_uring: add support for zone-append · Luis Chamberlain <mcgrof@kernel.org> · 2022-03-02
Re: [PATCH v4 6/6] io_uring: add support for zone-append · Kanchan Joshi <hidden> · 2020-07-31
Re: [PATCH v4 6/6] io_uring: add support for zone-append · Pavel Begunkov <asml.silence@gmail.com> · 2020-07-30

From: Jens Axboe <axboe@kernel.dk>
Date: 2020-07-30 17:16:08
Also in: io-uring, linux-block, linux-fsdevel, lkml

On 7/30/20 10:26 AM, Pavel Begunkov wrote:

On 30/07/2020 19:13, Jens Axboe wrote:

quoted

On 7/30/20 10:08 AM, Pavel Begunkov wrote:

quoted

On 27/07/2020 23:34, Jens Axboe wrote:

quoted

On 7/27/20 1:16 PM, Kanchan Joshi wrote:

quoted

On Fri, Jul 24, 2020 at 10:00 PM Jens Axboe [off-list ref] wrote:

quoted

On 7/24/20 9:49 AM, Kanchan Joshi wrote:

quoted

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 7809ab2..6510cf5 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c

@@ -1284,8 +1301,15 @@ static void __io_cqring_fill_event(struct io_kiocb *req, long res, long cflags)
      cqe = io_get_cqring(ctx);
      if (likely(cqe)) {
              WRITE_ONCE(cqe->user_data, req->user_data);
-             WRITE_ONCE(cqe->res, res);
-             WRITE_ONCE(cqe->flags, cflags);
+             if (unlikely(req->flags & REQ_F_ZONE_APPEND)) {
+                     if (likely(res > 0))
+                             WRITE_ONCE(cqe->res64, req->rw.append_offset);
+                     else
+                             WRITE_ONCE(cqe->res64, res);
+             } else {
+                     WRITE_ONCE(cqe->res, res);
+                     WRITE_ONCE(cqe->flags, cflags);
+             }

This would be nice to keep out of the fast path, if possible.

I was thinking of keeping a function-pointer (in io_kiocb) during
submission. That would have avoided this check......but argument count
differs, so it did not add up.

But that'd grow the io_kiocb just for this use case, which is arguably
even worse. Unless you can keep it in the per-request private data,
but there's no more room there for the regular read/write side.

quoted

diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index 92c2269..2580d93 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h

@@ -156,8 +156,13 @@ enum {
  */
 struct io_uring_cqe {
      __u64   user_data;      /* sqe->data submission passed back */
-     __s32   res;            /* result code for this event */
-     __u32   flags;
+     union {
+             struct {
+                     __s32   res;    /* result code for this event */
+                     __u32   flags;
+             };
+             __s64   res64;  /* appending offset for zone append */
+     };
 };

Is this a compatible change, both for now but also going forward? You
could randomly have IORING_CQE_F_BUFFER set, or any other future flags.

Sorry, I didn't quite understand the concern. CQE_F_BUFFER is not
used/set for write currently, so it looked compatible at this point.

Not worried about that, since we won't ever use that for writes. But it
is a potential headache down the line for other flags, if they apply to
normal writes.

quoted

Yes, no room for future flags for this operation.
Do you see any other way to enable this support in io-uring?

Honestly I think the only viable option is as we discussed previously,
pass in a pointer to a 64-bit type where we can copy the additional
completion information to.

TBH, I hate the idea of such overhead/latency at times when SSDs can
serve writes in less than 10ms. Any chance you measured how long does it

10us? :-)

Hah, 10us indeed :)

quoted

take to drag through task_work?

A 64-bit value copy is really not a lot of overhead... But yes, we'd
need to push the completion through task_work at that point, as we can't
do it from the completion side. That's not a lot of overhead, and most
notably, it's overhead that only affects this particular type.

That's not a bad starting point, and something that can always be
optimized later if need be. But I seriously doubt it'd be anything to
worry about.

I probably need to look myself how it's really scheduled, but if you don't
mind, here is a quick question: if we do work_add(task) when the task is
running in the userspace, wouldn't the work execution wait until the next
syscall/allotted time ends up?

It'll get the task to enter the kernel, just like signal delivery. The only
tricky part is really if we have a dependency waiting in the kernel, like
the recent eventfd fix.

-- 
Jens Axboe

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help