Thread (9 messages) 9 messages, 5 authors, 2022-03-11

Re: [PATCH] mke2fs: Add extended option for prezeroed storage devices

From: Andreas Dilger <hidden>
Date: 2021-09-21 21:40:03

On Sep 20, 2021, at 9:42 PM, Sarthak Kukreti [off-list ref] wrote:
From: Sarthak Kukreti <redacted>

This patch adds an extended option "assume_storage_prezeroed" to
mke2fs. When enabled, this option acts as a hint to mke2fs that
the underlying block device was zeroed before mke2fs was called.
This allows mke2fs to optimize out the zeroing of the inode
table and the journal, which speeds up the filesystem creation
time.

Additionally, on thinly provisioned storage devices (like Ceph,
dm-thin),
... and newly-created sparse loopback files
reads on unmapped extents return zero. This property
allows mke2fs (with assume_storage_prezeroed) to avoid
pre-allocating metadata space for inode tables for the entire
filesystem and saves space that would normally be preallocated
for zero inode tables.

Testing on ChromeOS (running linux kernel 4.19) with dm-thin
and 200GB thin logical volumes using 'mke2fs -t ext4 <dev>':

- Time taken by mke2fs drops from 1.07s to 0.08s.
- Avoiding zeroing out the inode table and journal reduces the
 initial metadata space allocation from 0.48% to 0.01%.
- Lazy inode table zeroing results in a further 1.45% of logical
 volume space getting allocated for inode tables, even if not file
 data is added to the filesystem. With assume_storage_prezeroed,
 the metadata allocation remains at 0.01%.
This seems beneficial, but I'm wondering if this could also be
done automatically when TRIM/DISCARD is used by mke2fs to erase
a device?

One safe option to do this automatically would be to start by
*reading* the disk blocks and check if they are all zero, and only
switch to zero-block writes if any block is found with non-zero
data.  That would avoid the extra space usage from zero-block
writes in the above cases, and also work for the huge majority of
users that won't know the "assume_storage_prezeroed" option even
exits, though it won't necessarily reduce the runtime.
quoted hunk ↗ jump to hunk
diff --git a/misc/mke2fs.c b/misc/mke2fs.c
index 04b2fbce..5293d9b0 100644
--- a/misc/mke2fs.c
+++ b/misc/mke2fs.c
@@ -3095,6 +3102,18 @@ int main (int argc, char *argv[])
		io_channel_set_options(fs->io, opt_string);
	}

+	if (assume_storage_prezeroed) {
+	  if (verbose)
+			printf("%s",
+				       _("Assuming the storage device is prezeroed "
+                         "- skipping inode table and journal wipe\n"));
+
+	  lazy_itable_init = 1;
+	  itable_zeroed = 1;
+	  zero_hugefile = 0;
+	  journal_flags |= EXT2_MKJOURNAL_LAZYINIT;
+	}
Indentation appears to be broken here - only 2 spaces instead of a tab.

This is also missing any kind of test case.  Since a large number of
the e2fsck test cases are using loopback filesystems created on a sparse
file, this would both be good test cases, as well as reducing time/space
used during testing.

Cheers, Andreas




Attachments

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help