Re: [PATCH v2 0/2] zone-append support in io-uring and aio
From: Kanchan Joshi <hidden>
Date: 2020-06-26 22:18:51
Also in:
io-uring, linux-fsdevel, lkml
On Fri, Jun 26, 2020 at 03:11:55AM +0000, Damien Le Moal wrote:
On 2020/06/26 2:18, Kanchan Joshi wrote:quoted
Semantics ---> Zone-append, by its nature, may perform write on a different location than what was specified. It does not fit into POSIX, and trying to fit may just undermine its benefit. It may be better to keep semantics as close to zone-append as possible i.e. specify zone-start location, and obtain the actual-write location post completion. Towards that goal, existing async APIs seem to fit fine. Async APIs (uring, linux aio) do not work on implicit write-pointer and demand explicit write offset (which is what we need for append). Neither write-pointerWhat do you mean by "implicit write pointer" ? Are you referring to the behavior of AIO write with a block device file open with O_APPEND ? The yes, it does not work. But that is perfectly fine for regular files, that is for zonefs.
Sorry, I meant file pointer. Yes, block-device opened with O_APPEND does not increase the file-pointer to end-of-device. That said, for uring and aio, file-pointer position plays no role, and it is application responsibility to pass the right write location.
I would prefer that this paragraph simply state the semantic that is implemented first. Then explain why the choice. But first, clarify how the API works, what is allowed, what's not etc. That will also simplify reviewing the code as one can then check the code against the goal.
In this path (block IO) there is hardly any scope/attempt to abstract away anything. So raw zoned-storage rule/semantics apply. I expect zone-aware applications, which are already aware of rules, to be consumer of this.
quoted
is taken as input, nor it is updated on completion. And there is a clear way to get zone-append result. Zone-aware applications while using these async APIs can be fine with, for the lack of better word, zone-append semantics itself. Sync APIs work with implicit write-pointer (at least few of those), and there is no way to obtain zone-append result, making it hard for user-space zone-append.Sync API are executed under inode lock, at least for regular files. So there is absolutely no problem to use zone append. zonefs does it already. The problem is the lack of locking for block device file.
Yes. I was refering to the problem of returning actual write-location using sync APIs like write, pwrite, pwritev/v2.
quoted
Tests ---> Using new interface in fio (uring and libaio engine) by extending zbd tests for zone-append: https://protect2.fireeye.com/url?k=e21dd5e0-bf837b7a-e21c5eaf-0cc47a336fae-c982437ed1be6cc8&q=1&u=https%3A%2F%2Fgithub.com%2Faxboe%2Ffio%2Fpull%2F1026 Changes since v1: - No new opcodes in uring or aio. Use RWF_ZONE_APPEND flag instead. - linux-aio changes vanish because of no new opcode - Fixed the overflow and other issues mentioned by Damien - Simplified uring support code, fixed the issues mentioned by Pavel - Added error checks Kanchan Joshi (1): fs,block: Introduce RWF_ZONE_APPEND and handling in direct IO path Selvakumar S (1): io_uring: add support for zone-append fs/block_dev.c | 28 ++++++++++++++++++++++++---- fs/io_uring.c | 32 ++++++++++++++++++++++++++++++-- include/linux/fs.h | 9 +++++++++ include/uapi/linux/fs.h | 5 ++++- 4 files changed, 67 insertions(+), 7 deletions(-)-- Damien Le Moal Western Digital Research