Re: [PATCH 13/13] ext4: Support for synchronous DAX faults

[RFC PATCH 0/13 v2] dax, ext4: Synchronous page faults · Jan Kara <jack@suse.cz> · 2017-08-17
[PATCH 03/13] dax: Factor out getting of pfn out of iomap · Jan Kara <jack@suse.cz> · 2017-08-17
Re: [PATCH 03/13] dax: Factor out getting of pfn out of iomap · Ross Zwisler <hidden> · 2017-08-18
Re: [PATCH 03/13] dax: Factor out getting of pfn out of iomap · Christoph Hellwig <hch@infradead.org> · 2017-08-23
[PATCH 04/13] dax: Create local variable for VMA in dax_iomap_pte_fault() · Jan Kara <jack@suse.cz> · 2017-08-17
Re: [PATCH 04/13] dax: Create local variable for VMA in dax_iomap_pte_fault() · Ross Zwisler <hidden> · 2017-08-18
Re: [PATCH 04/13] dax: Create local variable for VMA in dax_iomap_pte_fault() · Christoph Hellwig <hch@infradead.org> · 2017-08-23
[PATCH 01/13] mm: Remove VM_FAULT_HWPOISON_LARGE_MASK · Jan Kara <jack@suse.cz> · 2017-08-17
[PATCH 02/13] dax: Simplify arguments of dax_insert_mapping() · Jan Kara <jack@suse.cz> · 2017-08-17
[PATCH 11/13] dax, iomap: Add support for synchronous faults · Jan Kara <jack@suse.cz> · 2017-08-17
Re: [PATCH 11/13] dax, iomap: Add support for synchronous faults · Ross Zwisler <hidden> · 2017-08-21
Re: [PATCH 11/13] dax, iomap: Add support for synchronous faults · Jan Kara <jack@suse.cz> · 2017-08-22
Re: [PATCH 11/13] dax, iomap: Add support for synchronous faults · Ross Zwisler <hidden> · 2017-08-21
Re: [PATCH 11/13] dax, iomap: Add support for synchronous faults · Jan Kara <jack@suse.cz> · 2017-08-22
Re: [PATCH 11/13] dax, iomap: Add support for synchronous faults · Christoph Hellwig <hch@infradead.org> · 2017-08-24
Re: [PATCH 11/13] dax, iomap: Add support for synchronous faults · Jan Kara <jack@suse.cz> · 2017-08-24
Re: [PATCH 11/13] dax, iomap: Add support for synchronous faults · Christoph Hellwig <hch@infradead.org> · 2017-08-24
Re: [PATCH 11/13] dax, iomap: Add support for synchronous faults · Jan Kara <jack@suse.cz> · 2017-08-24
[PATCH 10/13] mm: Wire up MAP_SYNC · Jan Kara <jack@suse.cz> · 2017-08-17
Re: [PATCH 10/13] mm: Wire up MAP_SYNC · Ross Zwisler <hidden> · 2017-08-21
Re: [PATCH 10/13] mm: Wire up MAP_SYNC · Jan Kara <jack@suse.cz> · 2017-08-22
Re: [PATCH 10/13] mm: Wire up MAP_SYNC · Ross Zwisler <hidden> · 2017-08-21
Re: [PATCH 10/13] mm: Wire up MAP_SYNC · Jan Kara <jack@suse.cz> · 2017-08-22
Re: [PATCH 10/13] mm: Wire up MAP_SYNC · Dan Williams <hidden> · 2017-08-22
Re: [PATCH 10/13] mm: Wire up MAP_SYNC · Christoph Hellwig <hch@infradead.org> · 2017-08-23
Re: [PATCH 10/13] mm: Wire up MAP_SYNC · Jan Kara <jack@suse.cz> · 2017-08-24
[PATCH 12/13] dax: Implement dax_insert_pfn_mkwrite() · Jan Kara <jack@suse.cz> · 2017-08-17
Re: [PATCH 12/13] dax: Implement dax_insert_pfn_mkwrite() · Ross Zwisler <hidden> · 2017-08-21
[PATCH 09/13] dax: Allow dax_iomap_fault() to return pfn · Jan Kara <jack@suse.cz> · 2017-08-17
Re: [PATCH 09/13] dax: Allow dax_iomap_fault() to return pfn · Ross Zwisler <hidden> · 2017-08-21
Re: [PATCH 09/13] dax: Allow dax_iomap_fault() to return pfn · Christoph Hellwig <hch@infradead.org> · 2017-08-23
Re: [PATCH 09/13] dax: Allow dax_iomap_fault() to return pfn · Jan Kara <jack@suse.cz> · 2017-08-24
[PATCH 13/13] ext4: Support for synchronous DAX faults · Jan Kara <jack@suse.cz> · 2017-08-17
Re: [PATCH 13/13] ext4: Support for synchronous DAX faults · Ross Zwisler <hidden> · 2017-08-21
Re: [PATCH 13/13] ext4: Support for synchronous DAX faults · Jan Kara <jack@suse.cz> · 2017-08-22
Re: [PATCH 13/13] ext4: Support for synchronous DAX faults · Christoph Hellwig <hch@infradead.org> · 2017-08-23
Re: [PATCH 13/13] ext4: Support for synchronous DAX faults · Jan Kara <jack@suse.cz> · 2017-08-24
Re: [PATCH 13/13] ext4: Support for synchronous DAX faults · Christoph Hellwig <hch@infradead.org> · 2017-08-24
Re: [PATCH 13/13] ext4: Support for synchronous DAX faults · Christoph Hellwig <hch@infradead.org> · 2017-08-24
Re: [PATCH 13/13] ext4: Support for synchronous DAX faults · Jan Kara <jack@suse.cz> · 2017-08-24
[PATCH 05/13] dax: Create local variable for vmf->flags & FAULT_FLAG_WRITE test · Jan Kara <jack@suse.cz> · 2017-08-17
Re: [PATCH 05/13] dax: Create local variable for vmf->flags & FAULT_FLAG_WRITE test · Ross Zwisler <hidden> · 2017-08-18
Re: [PATCH 05/13] dax: Create local variable for vmf->flags & FAULT_FLAG_WRITE test · Christoph Hellwig <hch@infradead.org> · 2017-08-23
[PATCH 06/13] dax: Inline dax_insert_mapping() into the callsite · Jan Kara <jack@suse.cz> · 2017-08-17
Re: [PATCH 06/13] dax: Inline dax_insert_mapping() into the callsite · Ross Zwisler <hidden> · 2017-08-18
Re: [PATCH 06/13] dax: Inline dax_insert_mapping() into the callsite · Christoph Hellwig <hch@infradead.org> · 2017-08-23
[PATCH 08/13] dax: Fix comment describing dax_iomap_fault() · Jan Kara <jack@suse.cz> · 2017-08-17
Re: [PATCH 08/13] dax: Fix comment describing dax_iomap_fault() · Ross Zwisler <hidden> · 2017-08-18
Re: [PATCH 08/13] dax: Fix comment describing dax_iomap_fault() · Christoph Hellwig <hch@infradead.org> · 2017-08-23
[PATCH 07/13] dax: Inline dax_pmd_insert_mapping() into the callsite · Jan Kara <jack@suse.cz> · 2017-08-17
Re: [PATCH 07/13] dax: Inline dax_pmd_insert_mapping() into the callsite · Ross Zwisler <hidden> · 2017-08-18
Re: [PATCH 07/13] dax: Inline dax_pmd_insert_mapping() into the callsite · Christoph Hellwig <hch@infradead.org> · 2017-08-23

From: Ross Zwisler <hidden>
Date: 2017-08-21 19:19:50
Also in: linux-fsdevel, linux-xfs, nvdimm

On Thu, Aug 17, 2017 at 06:08:15PM +0200, Jan Kara wrote:

We return IOMAP_F_NEEDDSYNC flag from ext4_iomap_begin() for a
synchronous write fault when inode has some uncommitted metadata
changes. In the fault handler ext4_dax_fault() we then detect this case,
call vfs_fsync_range() to make sure all metadata is committed, and call
dax_pfn_mkwrite() to mark PTE as writeable. Note that this will also

Need to fix up the above line a little -
s/dax_pfn_mkwrite/dax_insert_pfn_mkwrite/, and we insert the PTE as well as
make it writeable.

quoted hunk ↗ jump to hunk

dirty corresponding radix tree entry which is what we want - fsync(2)
will still provide data integrity guarantees for applications not using
userspace flushing. And applications using userspace flushing can avoid
calling fsync(2) and thus avoid the performance overhead.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/file.c       | 36 ++++++++++++++++++++++++++++++------
 fs/ext4/inode.c      |  4 ++++
 fs/jbd2/journal.c    | 17 +++++++++++++++++
 include/linux/jbd2.h |  1 +
 4 files changed, 52 insertions(+), 6 deletions(-)

diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 850037e140d7..3765c4ed1368 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c

@@ -280,6 +280,7 @@ static int ext4_dax_huge_fault(struct vm_fault *vmf,
 	struct inode *inode = file_inode(vmf->vma->vm_file);
 	struct super_block *sb = inode->i_sb;
 	bool write = vmf->flags & FAULT_FLAG_WRITE;
+	pfn_t pfn;
 
 	if (write) {
 		sb_start_pagefault(sb);

@@ -287,16 +288,39 @@ static int ext4_dax_huge_fault(struct vm_fault *vmf,
 		down_read(&EXT4_I(inode)->i_mmap_sem);
 		handle = ext4_journal_start_sb(sb, EXT4_HT_WRITE_PAGE,
 					       EXT4_DATA_TRANS_BLOCKS(sb));
+		if (IS_ERR(handle)) {
+			up_read(&EXT4_I(inode)->i_mmap_sem);
+			sb_end_pagefault(sb);
+			return VM_FAULT_SIGBUS;
+		}
 	} else {
 		down_read(&EXT4_I(inode)->i_mmap_sem);
 	}
-	if (!IS_ERR(handle))
-		result = dax_iomap_fault(vmf, pe_size, &ext4_iomap_ops, NULL);
-	else
-		result = VM_FAULT_SIGBUS;
+	result = dax_iomap_fault(vmf, pe_size, &ext4_iomap_ops, &pfn);
 	if (write) {
-		if (!IS_ERR(handle))
-			ext4_journal_stop(handle);
+		ext4_journal_stop(handle);
+		/* Write fault but PFN mapped only RO? */

The above comment is out of date.

+		if (result & VM_FAULT_NEEDDSYNC) {
+			int err;
+			loff_t start = ((loff_t)vmf->pgoff) << PAGE_SHIFT;
+			size_t len = 0;
+
+			if (pe_size == PE_SIZE_PTE)
+				len = PAGE_SIZE;
+#ifdef CONFIG_FS_DAX_PMD
+			else if (pe_size == PE_SIZE_PMD)
+				len = HPAGE_PMD_SIZE;

In fs/dax.c we always use PMD_SIZE.  It looks like HPAGE_PMD_SIZE and PMD_SIZE
are always the same (from include/linux/huge_mm.h, the only defintion of
HPAGE_PMD_SIZE):

#define HPAGE_PMD_SHIFT PMD_SHIFT
#define HPAGE_PMD_SIZE	((1UL) << HPAGE_PMD_SHIFT)

and AFAICT PMD_SIZE is defined to be 1<<PMD_SHIFT for all architectures as
well.  I don't understand why we have both?

In any case, neither HPAGE_PMD_SIZE nor PMD_SIZE are used anywhere else in the
ext4 code, so can we use PMD_SIZE here for consistency?  If they ever did
manage to be different, I think we'd want PMD_SIZE anyway.

With those nits and an updated changelog:

Reviewed-by: Ross Zwisler <redacted>

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help