Re: [PATCH v6 3/3] xfs: add support for FALLOC_FL_WRITE_ZEROES

From: Pankaj Raghav (Samsung) <hidden>
Date: 2026-06-18 08:19:21
Also in: linux-fsdevel, linux-xfs

On Thu, Jun 18, 2026 at 11:22:45AM +0800, Zhang Yi wrote:

On 6/17/2026 5:44 PM, Pankaj Raghav (Samsung) wrote:

quoted

On Tue, Jun 16, 2026 at 06:31:40AM -0700, Christoph Hellwig wrote:

quoted

[API questions for Zhang and -fsdevel/ -api below)

quoted

+	unsigned int		blksize = i_blocksize(inode);
+	loff_t			offset_aligned = round_down(offset, blksize);

I think this actually needs to found up instead of rounding down.

quoted

+	/*
+	 * Zero the tail of the old EOF block and any space up to the new
+	 * offset.
+	 * In the usual truncate path, xfs_falloc_setsize takes care of
+	 * zeroing those blocks.
+	 */
+	if (offset_aligned > old_size) {
+		trace_xfs_zero_eof(ip, old_size, offset_aligned - old_size);
+		error = xfs_zero_range(ip, old_size, offset_aligned - old_size,
+				NULL, &did_zero);
+		if (error)
+			return error;
+	}

... then this will properly zero from the old i_size to the first block
boundary after the old size.

Hmm, right now we do this:

|----------|----------|----------|
    ^      ^     ^    ^
    |      |     |    |
 old_size  |   offset |
           |          |
	off_rd       off_ru

At the moment, we zero out old_size to off_rd and pass offset to
xfs_alloc_file_space. xfs_alloc_file_space rounds down the offset to off_rd.

What you are proposing is to zero out old_size to off_ru, and pass
off_ru to xfs_alloc_file_space. I don't exactly understand the
difference.

IMO, FALLOC_FL_WRITE_ZEROES should handle the unaligned cases, if the
'offset' and 'end' are not block-size aligned, then:

1) if the two blocks straddling the boundaries have not yet been allocated,
   or allocated as unwritten, we should round outward the allocation range
   and zero out all allocated blocks, including those two boundary blocks.
2) if the blocks at the boundaries are already in the written state — which
   can occur when we call FALLOC_FL_WRITE_ZEROES within the file size. We
   should be careful here: we should only zero the ranges [offset, offset_ru)
   and [end_rd, end) for the boundary blocks, leaving the already-written
   portions of the boundary blocks intact.

Thoughs?

Ok, this makes sense to me.

@Christoph, now I understood your reply about rounding up and rounding
down.

So, I could do xfs_zero_range(offset, offset_ru)[1] and xfs_zero_range(end_rd, end).
(offset_ru, end_rd) will be using the accelerated XFS_BMAPI_ZERO to 
zero out the extents. 

I also need to add pagecache_isize_extended and filemap_write_and_wait_range
to persist the xfs_zero_range calls before we call setfilesize.

xfs_zero_range should take care of the boundary blocks so that we don't
overwrite any data or zeroing out the unallocated or unwritten blocks as
pointed out in 1 and 2.

Let me know what you think. I am also wondering how fsx did not trigger
the boundary block edge case where the current impl might zero out user
data in the boundary blocks.

[1] if old_size < offset, then xfs_zero_range(old_size, offset_ru)) 
--
Pankaj

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help