Re: [PATCH 2/5] iomap: use accelerated zeroing on a block device to zero a file range
From: Dave Chinner <david@fromorbit.com>
Date: 2021-09-21 22:33:48
Also in:
linux-fsdevel
On Fri, Sep 17, 2021 at 06:30:55PM -0700, Darrick J. Wong wrote:
quoted hunk ↗ jump to hunk
From: Darrick J. Wong <djwong@kernel.org> Create a function that ensures that the storage backing part of a file contains zeroes and will not trip over old media errors if the contents are re-read. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/iomap/direct-io.c | 75 +++++++++++++++++++++++++++++++++++++++++++++++++ include/linux/iomap.h | 3 ++ 2 files changed, 78 insertions(+)diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index 4ecd255e0511..48826a49f976 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c@@ -652,3 +652,78 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, return iomap_dio_complete(dio); } EXPORT_SYMBOL_GPL(iomap_dio_rw); + +static loff_t +iomap_zeroinit_iter(struct iomap_iter *iter) +{ + struct iomap *iomap = &iter->iomap; + const struct iomap *srcmap = iomap_iter_srcmap(iter); + const u64 start = iomap->addr + iter->pos - iomap->offset; + const u64 nr_bytes = iomap_length(iter); + sector_t sector = start >> SECTOR_SHIFT; + sector_t nr_sectors = nr_bytes >> SECTOR_SHIFT; + int ret; + + if (!iomap->bdev) + return -ECANCELED; + + /* The physical extent must be sector-aligned for block layer APIs. */ + if ((start | nr_bytes) & (SECTOR_SIZE - 1)) + return -EINVAL; + + /* Must be able to zero storage directly without fs intervention. */ + if (iomap->flags & IOMAP_F_SHARED) + return -ECANCELED; + if (srcmap != iomap) + return -ECANCELED; + + switch (iomap->type) { + case IOMAP_MAPPED: + ret = blkdev_issue_zeroout(iomap->bdev, sector, nr_sectors, + GFP_KERNEL, 0);
Pretty sure this needs to use BLKDEV_ZERO_NOUNMAP. The whole point of this is having zeroed space allocated ready for write on return, so having the hardware optimise away the physical storage zeroing by punching a hole in it's backing store and then potentially getting ENOSPC on the next write to this range would be .... suboptimal. Cheers, Dave. -- Dave Chinner david@fromorbit.com