Thread (33 messages) 33 messages, 5 authors, 2019-11-11

Re: [PATCH 5/5] fs/xfs: Allow toggle of physical DAX flag

From: Ira Weiny <hidden>
Date: 2019-10-21 22:49:36
Also in: linux-fsdevel, linux-xfs, lkml

On Mon, Oct 21, 2019 at 11:45:36AM +1100, Dave Chinner wrote:
On Sun, Oct 20, 2019 at 08:59:35AM -0700, ira.weiny@intel.com wrote:
quoted
@@ -1232,12 +1233,10 @@ xfs_diflags_to_linux(
 		inode->i_flags |= S_NOATIME;
 	else
 		inode->i_flags &= ~S_NOATIME;
-#if 0	/* disabled until the flag switching races are sorted out */
 	if (xflags & FS_XFLAG_DAX)
 		inode->i_flags |= S_DAX;
 	else
 		inode->i_flags &= ~S_DAX;
-#endif
This code has bit-rotted. See xfs_setup_iops(), where we now have a
different inode->i_mapping->a_ops for DAX inodes.
:-(
That, fundamentally, is the issue here - it's not setting/clearing
the DAX flag that is the issue, it's doing a swap of the
mapping->a_ops while there may be other code using that ops
structure.

IOWs, if there is any code anywhere in the kernel that
calls an address space op without holding one of the three locks we
hold here (i_rwsem, MMAPLOCK, ILOCK) then it can race with the swap
of the address space operations.

By limiting the address space swap to file sizes of zero, we rule
out the page fault path (mmap of a zero length file segv's with an
access beyond EOF on the first read/write page fault, right?).
Yes I checked that and thought we were safe here...
However, other aops callers that might run unlocked and do the wrong
thing if the aops pointer is swapped between check of the aop method
existing and actually calling it even if the file size is zero?

A quick look shows that FIBMAP (ioctl_fibmap())) looks susceptible
to such a race condition with the current definitions of the XFS DAX
aops. I'm guessing there will be others, but I haven't looked
further than this...
I'll check for others and think on what to do about this.  ext4 will have the
same problem I think.  :-(

I don't suppose using a single a_ops for both DAX and non-DAX is palatable?
quoted
 	/* lock, flush and invalidate mapping in preparation for flag change */
 	xfs_ilock(ip, XFS_MMAPLOCK_EXCL | XFS_IOLOCK_EXCL);
+
+	if (i_size_read(inode) != 0) {
+		error = -EOPNOTSUPP;
+		goto out_unlock;
+	}
Wrong error. Should be the same as whatever is returned when we try
to change the extent size hint and can't because the file is
non-zero in length (-EINVAL, I think). Also needs a comment
explainging why this check exists, and probably better written as
i_size_read() > 0 ....
Done, done, and done.

Thank you,
Ira
Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help