Re: [PATCH 3/3] io_uring: add support for zone-append
From: Jens Axboe <axboe@kernel.dk>
Date: 2020-06-19 14:15:12
Also in:
io-uring, linux-fsdevel, lkml
Possibly related (same subject, not in this thread)
- 2020-06-19 · Re: [PATCH 3/3] io_uring: add support for zone-append · Alexey Dobriyan <hidden>
- 2020-06-19 · Re: [PATCH 3/3] io_uring: add support for zone-append · Jens Axboe <axboe@kernel.dk>
On 6/19/20 3:41 AM, javier.gonz@samsung.com wrote:
Jens, Would you have time to answer a question below in this thread? On 18.06.2020 11:11, javier.gonz@samsung.com wrote:quoted
On 18.06.2020 08:47, Damien Le Moal wrote:quoted
On 2020/06/18 17:35, javier.gonz@samsung.com wrote:quoted
On 18.06.2020 07:39, Damien Le Moal wrote:quoted
On 2020/06/18 2:27, Kanchan Joshi wrote:quoted
From: Selvakumar S <redacted> Introduce three new opcodes for zone-append - IORING_OP_ZONE_APPEND : non-vectord, similiar to IORING_OP_WRITE IORING_OP_ZONE_APPENDV : vectored, similar to IORING_OP_WRITEV IORING_OP_ZONE_APPEND_FIXED : append using fixed-buffers Repurpose cqe->flags to return zone-relative offset. Signed-off-by: SelvaKumar S <redacted> Signed-off-by: Kanchan Joshi <redacted> Signed-off-by: Nitesh Shetty <redacted> Signed-off-by: Javier Gonzalez <redacted> --- fs/io_uring.c | 72 +++++++++++++++++++++++++++++++++++++++++-- include/uapi/linux/io_uring.h | 8 ++++- 2 files changed, 77 insertions(+), 3 deletions(-)diff --git a/fs/io_uring.c b/fs/io_uring.c index 155f3d8..c14c873 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c@@ -649,6 +649,10 @@ struct io_kiocb {unsigned long fsize; u64 user_data; u32 result; +#ifdef CONFIG_BLK_DEV_ZONED + /* zone-relative offset for append, in bytes */ + u32 append_offset;this can overflow. u64 is needed.We chose to do it this way to start with because struct io_uring_cqe only has space for u32 when we reuse the flags. We can of course create a new cqe structure, but that will come with larger changes to io_uring for supporting append. Do you believe this is a better approach?The problem is that zone size are 32 bits in the kernel, as a number of sectors. So any device that has a zone size smaller or equal to 2^31 512B sectors can be accepted. Using a zone relative offset in bytes for returning zone append result is OK-ish, but to match the kernel supported range of possible zone size, you need 31+9 bits... 32 does not cut it.Agree. Our initial assumption was that u32 would cover current zone size requirements, but if this is a no-go, we will take the longer path.Converting to u64 will require a new version of io_uring_cqe, where we extend at least 32 bits. I believe this will need a whole new allocation and probably ioctl(). Is this an acceptable change for you? We will of course add support for liburing when we agree on the right way to do this.
If you need 64-bit of return value, then it's not going to work. Even with the existing patches, reusing cqe->flags isn't going to fly, as it would conflict with eg doing zone append writes with automatic buffer selection. We're not changing the io_uring_cqe. It's important to keep it lean, and any other request type is generally fine with 64-bit tag + 32-bit result (and 32-bit flags on the side) for completions. Only viable alternative I see would be to provide an area to store this information, and pass in a pointer to this at submission time through the sqe. One issue I do see with that is if we only have this information available at completion time, then we'd need some async punt to copy it to user space... Generally not ideal. A hackier approach would be to play some tricks with cqe->res and cqe->flags, setting aside a flag to denote an extension of cqe->res. That would mean excluding zone append (etc) from using buffer selection, which probably isn't a huge deal. It'd be more problematic for any other future flags. But if you just need 40 bits, then it could certainly work. Rigth now, if cqe->flags & 1 is set, then (cqe->flags >> 16) is the buffer ID. You could define IORING_CQE_F_ZONE_FOO to be bit 1, so that: uint64_t val = cqe->res; // assuming non-error here if (cqe->flags & IORING_CQE_F_ZONE_FOO) val |= (cqe->flags >> 16) << 32ULL; and hence use the upper 16 bits of cqe->flags for the upper bits of your (then) 48-bit total value. -- Jens Axboe