Re: [PATCH 2/2] zonefs: use zone-append for AIO as well
From: Christoph Hellwig <hch@lst.de>
Date: 2020-07-22 14:52:02
Also in:
linux-fsdevel
On Wed, Jul 22, 2020 at 12:43:21PM +0000, Johannes Thumshirn wrote:
On 21/07/2020 07:54, Christoph Hellwig wrote:quoted
On Mon, Jul 20, 2020 at 04:48:50PM +0000, Johannes Thumshirn wrote:quoted
On 20/07/2020 15:45, Christoph Hellwig wrote:quoted
On Mon, Jul 20, 2020 at 10:21:18PM +0900, Johannes Thumshirn wrote:quoted
On a successful completion, the position the data is written to is returned via AIO's res2 field to the calling application.That is a major, and except for this changelog, undocumented ABI change. We had the whole discussion about reporting append results in a few threads and the issues with that in io_uring. So let's have that discussion there and don't mix it up with how zonefs writes data. Without that a lot of the boilerplate code should also go away.OK maybe I didn't remember correctly, but wasn't this all around io_uring and how we'd report the location back for raw block device access?Report the write offset. The author seems to be hell bent on making it block device specific, but that is a horrible idea as it is just as useful for normal file systems (or zonefs).After having looked into io_uring I don't this there is anything that prevents io_uring from picking up the write offset from ki_complete's res2 argument. As of now io_uring ignores the filed but that can be changed.
Sure. Except for the fact that the io_uring CQE doesn't have space for it. See the currently ongoing discussion on that..
So the only thing that needs to be done from a zonefs perspective is documenting the use of res2 and CC linux-aio and linux-abi (including an update of the io_getevents man page). Or am I completely off track now?
Yes. We should not have a different ABI just for zonefs. We need to support this feature in a generic way and not as a weird one off for one filesystem and only with the legacy AIO interface. Either way please make sure you properly separate the interface ( using Write vs Zone Append in zonefs) from the interface (returning the actually written offset from appending writes), as they are quite separate issues.