Re: [PATCH v6 3/3] xfs: add support for FALLOC_FL_WRITE_ZEROES
From: Pankaj Raghav (Samsung) <hidden>
Date: 2026-06-18 08:19:21
Also in:
linux-fsdevel, linux-xfs
On Thu, Jun 18, 2026 at 11:22:45AM +0800, Zhang Yi wrote:
On 6/17/2026 5:44 PM, Pankaj Raghav (Samsung) wrote:quoted
On Tue, Jun 16, 2026 at 06:31:40AM -0700, Christoph Hellwig wrote:quoted
[API questions for Zhang and -fsdevel/ -api below)quoted
+ unsigned int blksize = i_blocksize(inode); + loff_t offset_aligned = round_down(offset, blksize);I think this actually needs to found up instead of rounding down.quoted
+ /* + * Zero the tail of the old EOF block and any space up to the new + * offset. + * In the usual truncate path, xfs_falloc_setsize takes care of + * zeroing those blocks. + */ + if (offset_aligned > old_size) { + trace_xfs_zero_eof(ip, old_size, offset_aligned - old_size); + error = xfs_zero_range(ip, old_size, offset_aligned - old_size, + NULL, &did_zero); + if (error) + return error; + }... then this will properly zero from the old i_size to the first block boundary after the old size.Hmm, right now we do this: |----------|----------|----------| ^ ^ ^ ^ | | | | old_size | offset | | | off_rd off_ru At the moment, we zero out old_size to off_rd and pass offset to xfs_alloc_file_space. xfs_alloc_file_space rounds down the offset to off_rd. What you are proposing is to zero out old_size to off_ru, and pass off_ru to xfs_alloc_file_space. I don't exactly understand the difference.IMO, FALLOC_FL_WRITE_ZEROES should handle the unaligned cases, if the 'offset' and 'end' are not block-size aligned, then: 1) if the two blocks straddling the boundaries have not yet been allocated, or allocated as unwritten, we should round outward the allocation range and zero out all allocated blocks, including those two boundary blocks. 2) if the blocks at the boundaries are already in the written state — which can occur when we call FALLOC_FL_WRITE_ZEROES within the file size. We should be careful here: we should only zero the ranges [offset, offset_ru) and [end_rd, end) for the boundary blocks, leaving the already-written portions of the boundary blocks intact. Thoughs?
Ok, this makes sense to me. @Christoph, now I understood your reply about rounding up and rounding down. So, I could do xfs_zero_range(offset, offset_ru)[1] and xfs_zero_range(end_rd, end). (offset_ru, end_rd) will be using the accelerated XFS_BMAPI_ZERO to zero out the extents. I also need to add pagecache_isize_extended and filemap_write_and_wait_range to persist the xfs_zero_range calls before we call setfilesize. xfs_zero_range should take care of the boundary blocks so that we don't overwrite any data or zeroing out the unallocated or unwritten blocks as pointed out in 1 and 2. Let me know what you think. I am also wondering how fsx did not trigger the boundary block edge case where the current impl might zero out user data in the boundary blocks. [1] if old_size < offset, then xfs_zero_range(old_size, offset_ru)) -- Pankaj