Thread (64 messages) 64 messages, 4 authors, 2021-12-16

Re: [PATCH v2 19/28] iomap: Convert __iomap_zero_iter to use a folio

From: "Darrick J. Wong" <djwong@kernel.org>
Date: 2021-12-16 19:36:18
Also in: linux-fsdevel, linux-xfs, lkml

On Thu, Dec 09, 2021 at 09:38:03PM +0000, Matthew Wilcox wrote:
quoted hunk ↗ jump to hunk
On Mon, Nov 08, 2021 at 04:05:42AM +0000, Matthew Wilcox (Oracle) wrote:
quoted
+++ b/fs/iomap/buffered-io.c
@@ -881,17 +881,20 @@ EXPORT_SYMBOL_GPL(iomap_file_unshare);
 
 static s64 __iomap_zero_iter(struct iomap_iter *iter, loff_t pos, u64 length)
 {
+	struct folio *folio;
 	struct page *page;
 	int status;
-	unsigned offset = offset_in_page(pos);
-	unsigned bytes = min_t(u64, PAGE_SIZE - offset, length);
+	size_t offset, bytes;
 
-	status = iomap_write_begin(iter, pos, bytes, &page);
+	status = iomap_write_begin(iter, pos, length, &page);
This turned out to be buggy.  Darrick and I figured out why his tests
were failing and mine weren't; this only shows up with a 4kB block
size filesystem and I was only testing with 1kB block size filesystems.
(at least on x86; I haven't figured out why it passes with 1kB block size
filesystems, so I'm not sure what would be true on other filesystems).
iomap_write_begin() is not prepared to deal with a length that spans a
page boundary.  So I'm replacing this patch with the following patches
(whitespace damaged; pick them up from
https://git.infradead.org/users/willy/linux.git/tag/refs/tags/iomap-folio-5.17c
if you want to compile them):

commit 412212960b72
Author: Matthew Wilcox (Oracle) [off-list ref]
Date:   Thu Dec 9 15:47:44 2021 -0500

    iomap: Allow iomap_write_begin() to be called with the full length

    In the future, we want write_begin to know the entire length of the
    write so that it can choose to allocate large folios.  Pass the full
    length in from __iomap_zero_iter() and limit it where necessary.

    Signed-off-by: Matthew Wilcox (Oracle) [off-list ref]
diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index d67108489148..9270db17c435 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -968,6 +968,9 @@ static int gfs2_iomap_page_prepare(struct inode *inode, loff_t pos,
        struct gfs2_sbd *sdp = GFS2_SB(inode);
        unsigned int blocks;

+       /* gfs2 does not support large folios yet */
+       if (len > PAGE_SIZE)
+               len = PAGE_SIZE;
This is awkward -- gfs2 doesn't set the mapping flag to indicate that it
supports large folios, so it should never be asked to deal with more
than a page at a time.  Shouldn't iomap_write_begin clamp its len
argument to PAGE_SIZE at the start if the mapping doesn't have the large
folios flag set?

--D
quoted hunk ↗ jump to hunk
        blocks = ((pos & blockmask) + len + blockmask) >> inode->i_blkbits;
        return gfs2_trans_begin(sdp, RES_DINODE + blocks, 0);
 }
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 8d7a67655b60..67fcd3b9928d 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -632,6 +632,8 @@ static int iomap_write_begin(const struct iomap_iter *iter, loff_t pos,
                goto out_no_page;
        }
        folio = page_folio(page);
+       if (pos + len > folio_pos(folio) + folio_size(folio))
+               len = folio_pos(folio) + folio_size(folio) - pos;

        if (srcmap->type == IOMAP_INLINE)
                status = iomap_write_begin_inline(iter, page);
@@ -891,16 +893,19 @@ static s64 __iomap_zero_iter(struct iomap_iter *iter, loff
_t pos, u64 length)
        struct page *page;
        int status;
        unsigned offset = offset_in_page(pos);
-       unsigned bytes = min_t(u64, PAGE_SIZE - offset, length);

-       status = iomap_write_begin(iter, pos, bytes, &page);
+       if (length > UINT_MAX)
+               length = UINT_MAX;
+       status = iomap_write_begin(iter, pos, length, &page);
        if (status)
                return status;
+       if (length > PAGE_SIZE - offset)
+               length = PAGE_SIZE - offset;

-       zero_user(page, offset, bytes);
+       zero_user(page, offset, length);
        mark_page_accessed(page);

-       return iomap_write_end(iter, pos, bytes, bytes, page);
+       return iomap_write_end(iter, pos, length, length, page);
 }

 static loff_t iomap_zero_iter(struct iomap_iter *iter, bool *did_zero)


commit 78c747a1b3a1
Author: Matthew Wilcox (Oracle) [off-list ref]
Date:   Fri Nov 5 14:24:09 2021 -0400

    iomap: Convert __iomap_zero_iter to use a folio
    
    The zero iterator can work in folio-sized chunks instead of page-sized
    chunks.  This will save a lot of page cache lookups if the file is cached
    in large folios.
    
    Signed-off-by: Matthew Wilcox (Oracle) [off-list ref]
    Reviewed-by: Christoph Hellwig [off-list ref]
    Reviewed-by: Darrick J. Wong [off-list ref]
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 67fcd3b9928d..bbde6d4f27cd 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -890,20 +890,23 @@ EXPORT_SYMBOL_GPL(iomap_file_unshare);
 
 static s64 __iomap_zero_iter(struct iomap_iter *iter, loff_t pos, u64 length)
 {
+       struct folio *folio;
        struct page *page;
        int status;
-       unsigned offset = offset_in_page(pos);
+       size_t offset;
 
        if (length > UINT_MAX)
                length = UINT_MAX;
        status = iomap_write_begin(iter, pos, length, &page);
        if (status)
                return status;
-       if (length > PAGE_SIZE - offset)
-               length = PAGE_SIZE - offset;
+       folio = page_folio(page);
 
-       zero_user(page, offset, length);
-       mark_page_accessed(page);
+       offset = offset_in_folio(folio, pos);
+       if (length > folio_size(folio) - offset)
+               length = folio_size(folio) - offset;
+       folio_zero_range(folio, offset, length);
+       folio_mark_accessed(folio);
 
        return iomap_write_end(iter, pos, length, length, page);
 }

The xfstests that Darrick identified as failing all passed.  Running a
full sweep now; then I'll re-run with a 1kB filesystem to be sure that
still passes.  Then I'll send another pull request.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help