Thread (9 messages) 9 messages, 5 authors, 2022-03-11

Re: [PATCH] mke2fs: Add extended option for prezeroed storage devices

From: Sarthak Kukreti <hidden>
Date: 2021-09-27 10:43:58

Thanks for reviewing the patch, Andreas!

On Tue, Sep 21, 2021 at 2:39 PM Andreas Dilger [off-list ref] wrote:
On Sep 20, 2021, at 9:42 PM, Sarthak Kukreti [off-list ref] wrote:
quoted
is
From: Sarthak Kukreti <redacted>
...
quoted
Additionally, on thinly provisioned storage devices (like Ceph,
dm-thin),
... and newly-created sparse loopback files
Thanks for pointing that out, added to the commit message in v2.
...
quoted
Testing on ChromeOS (running linux kernel 4.19) with dm-thin
and 200GB thin logical volumes using 'mke2fs -t ext4 <dev>':

- Time taken by mke2fs drops from 1.07s to 0.08s.
- Avoiding zeroing out the inode table and journal reduces the
 initial metadata space allocation from 0.48% to 0.01%.
- Lazy inode table zeroing results in a further 1.45% of logical
 volume space getting allocated for inode tables, even if not file
 data is added to the filesystem. With assume_storage_prezeroed,
 the metadata allocation remains at 0.01%.
This seems beneficial, but I'm wondering if this could also be
done automatically when TRIM/DISCARD is used by mke2fs to erase
a device?

One safe option to do this automatically would be to start by
*reading* the disk blocks and check if they are all zero, and only
switch to zero-block writes if any block is found with non-zero
data.  That would avoid the extra space usage from zero-block
writes in the above cases, and also work for the huge majority of
users that won't know the "assume_storage_prezeroed" option even
exits, though it won't necessarily reduce the runtime.
I agree with Ted (quoting a reply on a forked thread below) that
reading all inode table blocks on the device will slow down mke2fs a
lot depending on the storage medium and size. Maybe it can be done
instead at first mount in conjunction with lazy_itable_init ie. ext4
reads the block and only issues a zero-out if the block is not already
zero? Even so, an explicit hint would be compatible with this
approach: it avoids (unnecessarily) reading through all the inode
table blocks as long as the hint was passed at creation time.

On Wed, Sep 22, 2021 at 8:57 PM Theodore Ts'o [off-list ref] wrote:
The problem is mke2fs really does need to care about the performance
of discard or write same.  Users want mke2fs to be fast, especially
during the distro installation process.  That's why we implemented the
lazy inode table initialization feature in the first place.  So
reading all each block from the inode table to see if it's zero might
be slow, and so we might be better off just doing the lazy itable init
instead.
...
quoted
+     if (assume_storage_prezeroed) {
+       if (verbose)
+                     printf("%s",
+                                    _("Assuming the storage device is prezeroed "
+                         "- skipping inode table and journal wipe\n"));
+
+       lazy_itable_init = 1;
+       itable_zeroed = 1;
+       zero_hugefile = 0;
+       journal_flags |= EXT2_MKJOURNAL_LAZYINIT;
+     }
Indentation appears to be broken here - only 2 spaces instead of a tab.

This is also missing any kind of test case.  Since a large number of
the e2fsck test cases are using loopback filesystems created on a sparse
file, this would both be good test cases, as well as reducing time/space
used during testing.
Oops, thanks for catching that! Fixed in v2 and I added a test case
for this option. I was playing around with adding the option as a
default to tests/mke2fs.conf.in; that didn't affect the overall test
run time much (a lot of the tests seem to be dd'ing entire files and
not using sparse files).

Best
Sarthak
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help