Re: [PATCH v4 6/6] io_uring: add support for zone-append
From: Jens Axboe <axboe@kernel.dk>
Date: 2020-07-27 20:34:38
Also in:
io-uring, linux-block, linux-fsdevel, lkml
On 7/27/20 1:16 PM, Kanchan Joshi wrote:
On Fri, Jul 24, 2020 at 10:00 PM Jens Axboe [off-list ref] wrote:quoted
On 7/24/20 9:49 AM, Kanchan Joshi wrote:quoted
diff --git a/fs/io_uring.c b/fs/io_uring.c index 7809ab2..6510cf5 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c@@ -1284,8 +1301,15 @@ static void __io_cqring_fill_event(struct io_kiocb *req, long res, long cflags) cqe = io_get_cqring(ctx); if (likely(cqe)) { WRITE_ONCE(cqe->user_data, req->user_data); - WRITE_ONCE(cqe->res, res); - WRITE_ONCE(cqe->flags, cflags); + if (unlikely(req->flags & REQ_F_ZONE_APPEND)) { + if (likely(res > 0)) + WRITE_ONCE(cqe->res64, req->rw.append_offset); + else + WRITE_ONCE(cqe->res64, res); + } else { + WRITE_ONCE(cqe->res, res); + WRITE_ONCE(cqe->flags, cflags); + }This would be nice to keep out of the fast path, if possible.I was thinking of keeping a function-pointer (in io_kiocb) during submission. That would have avoided this check......but argument count differs, so it did not add up.
But that'd grow the io_kiocb just for this use case, which is arguably even worse. Unless you can keep it in the per-request private data, but there's no more room there for the regular read/write side.
quoted
quoted
diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index 92c2269..2580d93 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h@@ -156,8 +156,13 @@ enum { */ struct io_uring_cqe { __u64 user_data; /* sqe->data submission passed back */ - __s32 res; /* result code for this event */ - __u32 flags; + union { + struct { + __s32 res; /* result code for this event */ + __u32 flags; + }; + __s64 res64; /* appending offset for zone append */ + }; };Is this a compatible change, both for now but also going forward? You could randomly have IORING_CQE_F_BUFFER set, or any other future flags.Sorry, I didn't quite understand the concern. CQE_F_BUFFER is not used/set for write currently, so it looked compatible at this point.
Not worried about that, since we won't ever use that for writes. But it is a potential headache down the line for other flags, if they apply to normal writes.
Yes, no room for future flags for this operation. Do you see any other way to enable this support in io-uring?
Honestly I think the only viable option is as we discussed previously, pass in a pointer to a 64-bit type where we can copy the additional completion information to.
quoted
Layout would also be different between big and little endian, so not even that easy to set aside a flag for this. But even if that was done, we'd still have this weird API where liburing or the app would need to distinguish this cqe from all others based on... the user_data? Hence liburing can't do it, only the app would be able to. Just seems like a hack to me.Yes, only user_data to distinguish. Do liburing helpers need to look at cqe->res (and decide something) before returning the cqe to application?
They generally don't, outside of the internal timeout. But it's an issue for the API, as it forces applications to handle the CQEs a certain way. Normally there's flexibility. This makes the append writes behave differently than everything else, which is never a good idea. -- Jens Axboe