Re: [PATCH v4 5/7] fs: prioritize and separate direct_io from dax_io
From: Dan Williams <hidden>
Date: 2016-05-05 16:24:20
Also in:
linux-ext4, linux-fsdevel, linux-mm, linux-xfs, lkml
On Thu, May 5, 2016 at 8:22 AM, Christoph Hellwig [off-list ref] wrote:
On Thu, May 05, 2016 at 08:15:32AM -0700, Dan Williams wrote:quoted
quoted
Agreed - makig O_DIRECT less direct than not having it is plain stupid, and I somehow missed this initially.Of course I disagree because like Dave argues in the msync case we should do the correct thing first and make it fast later, but also like Dave this arguing in circles is getting tiresome.We should do the right thing first, and make it fast later. But this proposal is not getting it right - it still does not handle errors for the fast path, but magically makes it work for direct I/O by in general using a less optional path for O_DIRECT. It's getting the worst of all choices. As far as I can tell the only sensible option is to: - always try dax-like I/O first - have a custom get_user_pages + rw_bytes fallback handles bad blocks when hitting EIO
If you're on board with more special fallbacks for dax-capable block devices that indeed opens up the thinking. The O_DIRECT approach was meant to keep the error clearing model close to the traditional block device case, but yes that does constrain the implementation in sub-optimal ways. However, we still have the alignment problem in the rw_bytes case, how do we communicate to the application that only writes with a certain size/alignment will clear errors? That forced alignment assumption was the other appeal of O_DIRECT. Perhaps we can at least start with hole punching and block reallocation as the error clearing method while we think more about the write-to-clear case?
And then we need to sort out the concurrent write synchronization. Again there I think we absolutely have to obey Posix for the !O_DIRECT case and can avoid it for O_DIRECT, similar to the existing non-DAX semantics. If we want any special additional semantics we _will_ need a special O_DAX flag.
Ok, makes sense.