Re: [PATCH 2/2] zonefs: use zone-append for AIO as well

From: Kanchan Joshi <hidden>
Date: 2020-07-24 13:58:03
Also in: linux-fsdevel

On Wed, Jul 22, 2020 at 8:22 PM Christoph Hellwig [off-list ref] wrote:

On Wed, Jul 22, 2020 at 12:43:21PM +0000, Johannes Thumshirn wrote:

quoted

On 21/07/2020 07:54, Christoph Hellwig wrote:

quoted

On Mon, Jul 20, 2020 at 04:48:50PM +0000, Johannes Thumshirn wrote:

quoted

On 20/07/2020 15:45, Christoph Hellwig wrote:

quoted

On Mon, Jul 20, 2020 at 10:21:18PM +0900, Johannes Thumshirn wrote:

quoted

On a successful completion, the position the data is written to is
returned via AIO's res2 field to the calling application.

That is a major, and except for this changelog, undocumented ABI
change.  We had the whole discussion about reporting append results
in a few threads and the issues with that in io_uring.  So let's
have that discussion there and don't mix it up with how zonefs
writes data.  Without that a lot of the boilerplate code should
also go away.

OK maybe I didn't remember correctly, but wasn't this all around
io_uring and how we'd report the location back for raw block device
access?

Report the write offset.  The author seems to be hell bent on making
it block device specific, but that is a horrible idea as it is just
as useful for normal file systems (or zonefs).

Patchset only made the feature opt-in, due to the constraints that we
had. ZoneFS was always considered and it fits as fine as block-IO.
You already know that  we did not have enough room in io-uring, which
did not really allow to think of other FS (any-size cached-writes).
After working on multiple schemes in io_uring, now we have 64bits, and
we will return absolute offset in bytes now (in V4).

But still, it comes at the cost of sacrificing the ability to do
short-write, which is fine for zone-append but may trigger
behavior-change for regular file-append.
Write may become short if
- spanning beyond end-of-file
- going beyond RLIMIT_FSIZE limit
- probably for MAX_NON_LFS as well

We need to fail all above cases if we extend the current model for
regular FS. And that may break existing file-append users.
Class of applications which just append without caring about the exact
location - attempt was not to affect these while we try to enable the
path for zone-append.

Patches use O/RWF_APPEND, but try to isolate appending-write
(IOCB_APPEND) from appending-write-that-returns-location
(IOCB_ZONE_APPEND - can be renamed when we actually have all that it
takes to apply the feature in regular FS).
Enabling block-IO and zoneFS now, and keeping regular-FS as future
work - hope that does not sound too bad!

quoted

After having looked into io_uring I don't this there is anything that
prevents io_uring from picking up the write offset from ki_complete's
res2 argument. As of now io_uring ignores the filed but that can be
changed.

We use ret2 of ki_complete to collect append-offset in io_uring too.
It's just that unlike aio it required some work to send it to user-space.


--
Kanchan Joshi

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help