Re: [PATCH] mke2fs: Add extended option for prezeroed storage devices
From: Andreas Dilger <hidden>
Date: 2021-09-21 21:40:03
On Sep 20, 2021, at 9:42 PM, Sarthak Kukreti [off-list ref] wrote:
From: Sarthak Kukreti <redacted> This patch adds an extended option "assume_storage_prezeroed" to mke2fs. When enabled, this option acts as a hint to mke2fs that the underlying block device was zeroed before mke2fs was called. This allows mke2fs to optimize out the zeroing of the inode table and the journal, which speeds up the filesystem creation time. Additionally, on thinly provisioned storage devices (like Ceph, dm-thin),
... and newly-created sparse loopback files
reads on unmapped extents return zero. This property allows mke2fs (with assume_storage_prezeroed) to avoid pre-allocating metadata space for inode tables for the entire filesystem and saves space that would normally be preallocated for zero inode tables. Testing on ChromeOS (running linux kernel 4.19) with dm-thin and 200GB thin logical volumes using 'mke2fs -t ext4 <dev>': - Time taken by mke2fs drops from 1.07s to 0.08s. - Avoiding zeroing out the inode table and journal reduces the initial metadata space allocation from 0.48% to 0.01%. - Lazy inode table zeroing results in a further 1.45% of logical volume space getting allocated for inode tables, even if not file data is added to the filesystem. With assume_storage_prezeroed, the metadata allocation remains at 0.01%.
This seems beneficial, but I'm wondering if this could also be done automatically when TRIM/DISCARD is used by mke2fs to erase a device? One safe option to do this automatically would be to start by *reading* the disk blocks and check if they are all zero, and only switch to zero-block writes if any block is found with non-zero data. That would avoid the extra space usage from zero-block writes in the above cases, and also work for the huge majority of users that won't know the "assume_storage_prezeroed" option even exits, though it won't necessarily reduce the runtime.
quoted hunk ↗ jump to hunk
diff --git a/misc/mke2fs.c b/misc/mke2fs.c index 04b2fbce..5293d9b0 100644 --- a/misc/mke2fs.c +++ b/misc/mke2fs.c@@ -3095,6 +3102,18 @@ int main (int argc, char *argv[])io_channel_set_options(fs->io, opt_string); } + if (assume_storage_prezeroed) { + if (verbose) + printf("%s", + _("Assuming the storage device is prezeroed " + "- skipping inode table and journal wipe\n")); + + lazy_itable_init = 1; + itable_zeroed = 1; + zero_hugefile = 0; + journal_flags |= EXT2_MKJOURNAL_LAZYINIT; + }
Indentation appears to be broken here - only 2 spaces instead of a tab. This is also missing any kind of test case. Since a large number of the e2fsck test cases are using loopback filesystems created on a sparse file, this would both be good test cases, as well as reducing time/space used during testing. Cheers, Andreas
Attachments
- signature.asc [application/pgp-signature] 873 bytes