Thread (18 messages) 18 messages, 3 authors, 2d ago

Re: [PATCH v6 3/3] xfs: add support for FALLOC_FL_WRITE_ZEROES

From: Zhang Yi <hidden>
Date: 2026-06-18 03:22:53
Also in: linux-fsdevel, linux-xfs

On 6/17/2026 5:44 PM, Pankaj Raghav (Samsung) wrote:
On Tue, Jun 16, 2026 at 06:31:40AM -0700, Christoph Hellwig wrote:
quoted
[API questions for Zhang and -fsdevel/ -api below)
quoted
+	unsigned int		blksize = i_blocksize(inode);
+	loff_t			offset_aligned = round_down(offset, blksize);
I think this actually needs to found up instead of rounding down.
quoted
+	/*
+	 * Zero the tail of the old EOF block and any space up to the new
+	 * offset.
+	 * In the usual truncate path, xfs_falloc_setsize takes care of
+	 * zeroing those blocks.
+	 */
+	if (offset_aligned > old_size) {
+		trace_xfs_zero_eof(ip, old_size, offset_aligned - old_size);
+		error = xfs_zero_range(ip, old_size, offset_aligned - old_size,
+				NULL, &did_zero);
+		if (error)
+			return error;
+	}
... then this will properly zero from the old i_size to the first block
boundary after the old size.
Hmm, right now we do this:

|----------|----------|----------|
    ^      ^     ^    ^
    |      |     |    |
 old_size  |   offset |
           |          |
	off_rd       off_ru

At the moment, we zero out old_size to off_rd and pass offset to
xfs_alloc_file_space. xfs_alloc_file_space rounds down the offset to off_rd.

What you are proposing is to zero out old_size to off_ru, and pass
off_ru to xfs_alloc_file_space. I don't exactly understand the
difference.
IMO, FALLOC_FL_WRITE_ZEROES should handle the unaligned cases, if the
'offset' and 'end' are not block-size aligned, then:

1) if the two blocks straddling the boundaries have not yet been allocated,
   or allocated as unwritten, we should round outward the allocation range
   and zero out all allocated blocks, including those two boundary blocks.
2) if the blocks at the boundaries are already in the written state — which
   can occur when we call FALLOC_FL_WRITE_ZEROES within the file size. We
   should be careful here: we should only zero the ranges [offset, offset_ru)
   and [end_rd, end) for the boundary blocks, leaving the already-written
   portions of the boundary blocks intact.

Thoughs?

Regarding the second point, the current ext4 implementation has an issue —
it zeroes out the entire boundary blocks. I overlooked this previously, and
I appreciate you pointing it out.
quoted
quoted
+	error = xfs_alloc_file_space(ip, offset, len,
+			XFS_ALLOC_FILE_SPACE_WRITE_ZEROES);
... and here we need to pass offset_aligned instead of offset and
a new calculated len based on the last block boundary, and then
zero again after that.  That is assuming FALLOC_FL_WRITE_ZEROES
allows unaligned ranges for file systems.  The block code doesn't,
but I can't quite follow the ext4 code if it does or not, and there
is no mention of FALLOC_FL_WRITE_ZEROES even in the latest man-pages
tree.

I can't find any references to FALLOC_FL_WRITE_ZEROES in the man pages
master branch. Maybe we missed it. I can send a separate patch for that
once we have some clarity on the API.
Yeah, I missed to update the man pages last year. Thanks.

Best Regards,
Yi.
quoted
Maybe we also want xfstests that try unaligned FALLOC_FL_WRITE_ZEROES
and make sure no existing data before the range is lost and the
entire range is zeroed?
I added FALLOC_FL_WRITE_ZEROES support to ltp (both fsx and fsstress).
For example, generic/363 tests for unaligned writes and checks for any
stale data. By default, I think we do unaligned reads, writes and
truncate in fsx.
quoted
quoted
+	if (error)
+		return error;
+
+	/*
+	 * xfs_falloc_setsize() would re-zero the written extents via
+	 * iomap_zero_range(). Use xfs_setfilesize() instead.
+	 * Update in-core i_size first as xfs_setfilesize() clamps the on-disk
+	 * size to it.
+	 */
+	if (new_size > i_size_read(inode))
+		i_size_write(inode, new_size);
I think Sashiko is right that we need a pagecache_isize_extended and
filemap_write_and_wait_range calls here.
Ok. Current fsx or fsstress did not expose this
problem. I will look into this. Thanks Christoph.

--
Pankaj
  
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help