Thread (18 messages) 18 messages, 6 authors, 2020-07-27

Re: [PATCH 2/2] zonefs: use zone-append for AIO as well

From: Damien Le Moal <hidden>
Date: 2020-07-22 13:02:18
Also in: linux-fsdevel

On 2020/07/22 21:43, Johannes Thumshirn wrote:
On 21/07/2020 07:54, Christoph Hellwig wrote:
quoted
On Mon, Jul 20, 2020 at 04:48:50PM +0000, Johannes Thumshirn wrote:
quoted
On 20/07/2020 15:45, Christoph Hellwig wrote:
quoted
On Mon, Jul 20, 2020 at 10:21:18PM +0900, Johannes Thumshirn wrote:
quoted
On a successful completion, the position the data is written to is
returned via AIO's res2 field to the calling application.
That is a major, and except for this changelog, undocumented ABI
change.  We had the whole discussion about reporting append results
in a few threads and the issues with that in io_uring.  So let's
have that discussion there and don't mix it up with how zonefs
writes data.  Without that a lot of the boilerplate code should
also go away.
OK maybe I didn't remember correctly, but wasn't this all around 
io_uring and how we'd report the location back for raw block device
access?
Report the write offset.  The author seems to be hell bent on making
it block device specific, but that is a horrible idea as it is just
as useful for normal file systems (or zonefs).
After having looked into io_uring I don't this there is anything that
prevents io_uring from picking up the write offset from ki_complete's
res2 argument. As of now io_uring ignores the filed but that can be 
changed.

The reporting of the write offset to user-space still needs to be 
decided on from an io_uring PoV.

So the only thing that needs to be done from a zonefs perspective is 
documenting the use of res2 and CC linux-aio and linux-abi (including
an update of the io_getevents man page).

Or am I completely off track now?
That is the general idea. But Christoph point was that reporting the effective
write offset back to user space can be done not only for zone append, but also
for regular FS/files that are open with O_APPEND and being written with AIOs,
legacy or io_uring. Since for this case, the aio->aio_offset field is ignored
and the kiocb pos is initialized with the file size, then incremented with size
for the next AIO, the user never actually sees the actual write offset of its
AIOs. Reporting that back for regular files too can be useful, even though
current application can do without this (or do not use O_APPEND because it is
lacking).

Christoph, please loudly shout at me if I misunderstood you :)

For the regular FS/file case, getting the written file offset is simple. Only
need to use the kiocb->pos. That is not a per FS change.

For the user interface, yes, I agree, res2 is the way to go. And we need to
decide for io_uring how to do it. That is an API change, bacward compatible for
legacy AIO, but still a change. So linux-aio and linux-api lists should be
consulted. Ideally, for io_uring, something backward compatible would be nice
too. Not sure how to do it yet.

Whatever the interface, plugging zonefs into it is the trivial part as you
already did the heavier lifting with writing the async zone append path.

Thanks,
	Johannes

-- 
Damien Le Moal
Western Digital Research
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help