Re: [PATCH v4 5/7] fs: prioritize and separate direct_io from dax_io
From: Vishal Verma <vishal@kernel.org>
Date: 2016-05-02 15:51:31
Also in:
linux-ext4, linux-fsdevel, linux-mm, linux-xfs, lkml, nvdimm
On Mon, 2016-05-02 at 18:41 +0300, Boaz Harrosh wrote:
On 04/29/2016 12:16 AM, Vishal Verma wrote:quoted
All IO in a dax filesystem used to go through dax_do_io, which cannot handle media errors, and thus cannot provide a recovery path that can send a write through the driver to clear errors. Add a new iocb flag for DAX, and set it only for DAX mounts. In the IO path for DAX filesystems, use the same direct_IO path for both DAX and direct_io iocbs, but use the flags to identify when we are in O_DIRECT mode vs non O_DIRECT with DAX, and for O_DIRECT, use the conventional direct_IO path instead of DAX.Really? What are your thinking here? What about all the current users of O_DIRECT, you have just made them 4 times slower and "less concurrent*" then "buffred io" users. Since direct_IO path will queue an IO request and all. (And if it is not so slow then why do we need dax_do_io at all? [Rhetorical]) I hate it that you overload the semantics of a known and expected O_DIRECT flag, for special pmem quirks. This is an incompatible and unrelated overload of the semantics of O_DIRECT.
We overloaded O_DIRECT a long time ago when we made DAX piggyback on
the same path:
static inline bool io_is_direct(struct file *filp)
{
return (filp->f_flags & O_DIRECT) || IS_DAX(filp->f_mapping->host);
}
Yes O_DIRECT on a DAX mounted file system will now be slower, but -
quoted
This allows us a recovery path in the form of opening the file with O_DIRECT and writing to it with the usual O_DIRECT semantics (sector alignment restrictions).I understand that you want a sector aligned IO, right? for the clear of errors. But I hate it that you forced all O_DIRECT IO to be slow for this. Can you not make dax_do_io handle media errors? At least for the parts of the IO that are aligned. (And your recovery path application above can use only aligned IO to make sure) Please look for another solution. Even a special IOCTL_DAX_CLEAR_ERROR
- see all the versions of this series prior to this one, where we try to do a fallback...
[*"less concurrent" because of the queuing done in bdev. Note how pmem is not even multi-queue, and even if it was it will be much slower then DAX because of the code depth and all the locks and task switches done in the block layer. In DAX the final memcpy is done directly on the user-mode thread] Thanks Boaz