Re: [PATCH 2/2] zonefs: use zone-append for AIO as well
From: Kanchan Joshi <hidden>
Date: 2020-07-24 13:58:03
Also in:
linux-fsdevel
On Wed, Jul 22, 2020 at 8:22 PM Christoph Hellwig [off-list ref] wrote:
On Wed, Jul 22, 2020 at 12:43:21PM +0000, Johannes Thumshirn wrote:quoted
On 21/07/2020 07:54, Christoph Hellwig wrote:quoted
On Mon, Jul 20, 2020 at 04:48:50PM +0000, Johannes Thumshirn wrote:quoted
On 20/07/2020 15:45, Christoph Hellwig wrote:quoted
On Mon, Jul 20, 2020 at 10:21:18PM +0900, Johannes Thumshirn wrote:quoted
On a successful completion, the position the data is written to is returned via AIO's res2 field to the calling application.That is a major, and except for this changelog, undocumented ABI change. We had the whole discussion about reporting append results in a few threads and the issues with that in io_uring. So let's have that discussion there and don't mix it up with how zonefs writes data. Without that a lot of the boilerplate code should also go away.OK maybe I didn't remember correctly, but wasn't this all around io_uring and how we'd report the location back for raw block device access?Report the write offset. The author seems to be hell bent on making it block device specific, but that is a horrible idea as it is just as useful for normal file systems (or zonefs).
Patchset only made the feature opt-in, due to the constraints that we had. ZoneFS was always considered and it fits as fine as block-IO. You already know that we did not have enough room in io-uring, which did not really allow to think of other FS (any-size cached-writes). After working on multiple schemes in io_uring, now we have 64bits, and we will return absolute offset in bytes now (in V4). But still, it comes at the cost of sacrificing the ability to do short-write, which is fine for zone-append but may trigger behavior-change for regular file-append. Write may become short if - spanning beyond end-of-file - going beyond RLIMIT_FSIZE limit - probably for MAX_NON_LFS as well We need to fail all above cases if we extend the current model for regular FS. And that may break existing file-append users. Class of applications which just append without caring about the exact location - attempt was not to affect these while we try to enable the path for zone-append. Patches use O/RWF_APPEND, but try to isolate appending-write (IOCB_APPEND) from appending-write-that-returns-location (IOCB_ZONE_APPEND - can be renamed when we actually have all that it takes to apply the feature in regular FS). Enabling block-IO and zoneFS now, and keeping regular-FS as future work - hope that does not sound too bad!
quoted
After having looked into io_uring I don't this there is anything that prevents io_uring from picking up the write offset from ki_complete's res2 argument. As of now io_uring ignores the filed but that can be changed.
We use ret2 of ki_complete to collect append-offset in io_uring too. It's just that unlike aio it required some work to send it to user-space. -- Kanchan Joshi