Thread (25 messages) 25 messages, 5 authors, 2016-05-08

Re: [PATCH v4 5/7] fs: prioritize and separate direct_io from dax_io

From: Dan Williams <hidden>
Date: 2016-05-05 16:24:20
Also in: linux-ext4, linux-fsdevel, linux-mm, linux-xfs, lkml

On Thu, May 5, 2016 at 8:22 AM, Christoph Hellwig [off-list ref] wrote:
On Thu, May 05, 2016 at 08:15:32AM -0700, Dan Williams wrote:
quoted
quoted
Agreed - makig O_DIRECT less direct than not having it is plain stupid,
and I somehow missed this initially.
Of course I disagree because like Dave argues in the msync case we
should do the correct thing first and make it fast later, but also
like Dave this arguing in circles is getting tiresome.
We should do the right thing first, and make it fast later.  But this
proposal is not getting it right - it still does not handle errors
for the fast path, but magically makes it work for direct I/O by
in general using a less optional path for O_DIRECT.  It's getting the
worst of all choices.

As far as I can tell the only sensible option is to:

 - always try dax-like I/O first
 - have a custom get_user_pages + rw_bytes fallback handles bad blocks
   when hitting EIO
If you're on board with more special fallbacks for dax-capable block
devices that indeed opens up the thinking.  The O_DIRECT approach was
meant to keep the error clearing model close to the traditional block
device case, but yes that does constrain the implementation in
sub-optimal ways.

However, we still have the alignment problem in the rw_bytes case, how
do we communicate to the application that only writes with a certain
size/alignment will clear errors?  That forced alignment assumption
was the other appeal of O_DIRECT.  Perhaps we can at least start with
hole punching and block reallocation as the error clearing method
while we think more about the write-to-clear case?
And then we need to sort out the concurrent write synchronization.
Again there I think we absolutely have to obey Posix for the !O_DIRECT
case and can avoid it for O_DIRECT, similar to the existing non-DAX
semantics.  If we want any special additional semantics we _will_ need
a special O_DAX flag.
Ok, makes sense.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help