Re: [PATCH v5 2/3] ext4: add ioctl EXT4_IOC_CHECKPOINT
From: "Theodore Ts'o" <tytso@mit.edu>
Date: 2021-06-23 01:38:33
Subsystem:
ext4 file system, filesystems (vfs and infrastructure), the rest · Maintainers:
"Theodore Ts'o", Alexander Viro, Christian Brauner, Linus Torvalds
On Tue, May 18, 2021 at 03:13:26PM +0000, Leah Rumancik wrote:
ioctl EXT4_IOC_CHECKPOINT checkpoints and flushes the journal. This includes forcing all the transactions to the log, checkpointing the transactions, and flushing the log to disk. This ioctl takes u32 "flags" as an argument. Three flags are supported. EXT4_IOC_CHECKPOINT_FLAG_DRY_RUN can be used to verify input to the ioctl. It returns error if there is any invalid input, otherwise it returns success without performing any checkpointing. The other two flags, EXT4_IOC_CHECKPOINT_FLAG_DISCARD and EXT4_IOC_CHECKPOINT_FLAG_ZEROOUT, can be used to issue requests to discard or zeroout the journal logs blocks, respectively. At this point, EXT4_IOC_CHECKPOINT_FLAG_ZEROOUT is primarily added to enable testing of this codepath on devices that don't support discard. EXT4_IOC_CHECKPOINT_FLAG_DISCARD and EXT4_IOC_CHECKPOINT_FLAG_ZEROOUT cannot both be set. Systems that wish to achieve content deletion SLO can set up a daemon that calls this ioctl at a regular interval such that it matches with the SLO requirement. Thus, with this patch, the ext4_dir_entry2 wipeout patch[1], and the Ext4 "-o discard" mount option set, Ext4 can now guarantee that all file contents, file metatdata, and filenames will not be accessible through the filesystem and will have had discard or zeroout requests issued for corresponding device blocks. The __jbd2_journal_erase function could also be used to discard or zero-fill the journal during journal load after recovery. This would provide a potential solution to a journal replay bug reported earlier this year[2]. After a successful journal recovery, e2fsck can call this ioctl to discard the journal as well. [1] https://lore.kernel.org/linux-ext4/YIHknqxngB1sUdie@mit.edu/ (local) [2] https://lore.kernel.org/linux-ext4/YDZoaacIYStFQT8g@mit.edu/ (local) Signed-off-by: Leah Rumancik <redacted>
FYI, I've made the following change to this commit in the ext4 tree, in order to fix a test failure of ext4/050 when running on a block device that don't support discard --- for example, when running the ext4/dax test config. Cheers, - Ted
diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
index 1290fbda1399..5730aeca563c 100644
--- a/fs/ext4/ioctl.c
+++ b/fs/ext4/ioctl.c@@ -805,6 +805,7 @@ static int ext4_ioctl_checkpoint(struct file *filp, unsigned long arg) __u32 flags = 0; unsigned int flush_flags = 0; struct super_block *sb = file_inode(filp)->i_sb; + struct request_queue *q; if (copy_from_user(&flags, (__u32 __user *)arg, sizeof(__u32)))
@@ -822,6 +823,15 @@ static int ext4_ioctl_checkpoint(struct file *filp, unsigned long arg) if (!EXT4_SB(sb)->s_journal) return -ENODEV; + if (flags & ~JBD2_JOURNAL_FLUSH_VALID) + return -EINVAL; + + q = bdev_get_queue(EXT4_SB(sb)->s_journal->j_dev); + if (!q) + return -ENXIO; + if ((flags & JBD2_JOURNAL_FLUSH_DISCARD) && !blk_queue_discard(q)) + return -EOPNOTSUPP; + if (flags & EXT4_IOC_CHECKPOINT_FLAG_DRY_RUN) return 0;