Re: [PATCH 3/3] ext4: prevent getting empty inode buffer

From: Zhang Yi <yi.zhang@huawei.com>
Date: 2021-08-16 14:29:10
Subsystem: ext4 file system, filesystems (vfs and infrastructure), the rest · Maintainers: "Theodore Ts'o", Alexander Viro, Christian Brauner, Linus Torvalds

On 2021/8/13 21:44, Jan Kara wrote:

On Tue 10-08-21 22:27:22, Zhang Yi wrote:

quoted

In ext4_get_inode_loc(), we may skip IO and get an zero && uptodate
inode buffer when the inode monopolize an inode block for performance
reason. For most cases, ext4_mark_iloc_dirty() will fill the inode
buffer to make it fine, but we could miss this call if something bad
happened. Finally, __ext4_get_inode_loc_noinmem() may probably get an
empty inode buffer and trigger ext4 error.

For example, if we remove a nonexistent xattr on inode A,
ext4_xattr_set_handle() will return ENODATA before invoking
ext4_mark_iloc_dirty(), it will left an uptodate but zero buffer. We
will get checksum error message in ext4_iget() when getting inode again.

  EXT4-fs error (device sda): ext4_lookup:1784: inode #131074: comm cat: iget: checksum invalid

Even worse, if we allocate another inode B at the same inode block, it
will corrupt the inode A on disk when write back inode B.

So this patch clear uptodate flag and mark buffer new if we get an empty
buffer, clear it after we fill inode data or making read IO.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>

Thanks for the fix! Really good catch! The patch looks correct but
honestly, I'm not very happy about the special buffer_new handling. It
looks correct but I'm a bit uneasy that e.g. the block device code can
access this buffer and manipulate its state. Cannot we instead e.g. check
whether the buffer is uptodate in ext4_mark_iloc_dirty(), if not, lock it,
if still not uptodate, zero it, mark as uptodate, unlock it and then call
ext4_do_update_inode()? That would seem like a bit more foolproof solution
to me. Basically the fact that the buffer is not uptodate in
ext4_mark_iloc_dirty() would mean that nobody else is past
__ext4_get_inode_loc() for another inode in that buffer and so zeroing is
safe.

Thanks for your suggestion! I understand what you're concerned and your
approach looks fine except mark buffer uptodate just behind zero buffer
in ext4_mark_iloc_dirty(). Because I think (1) if ext4_do_update_inode()
return error before filling the inode, it will still left an uptodate
but zero buffer, and it's not easy to handle the error path. (2) it is
still not conform the semantic of buffer uptodate because it it not
contain an uptodate inode information. How about move mark as uptodate
into ext4_do_update_inode(), something like that（not tested）？

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index eae1b2d0b550..99ccba8d47c6 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c

@@ -4368,8 +4368,6 @@ static int __ext4_get_inode_loc(struct super_block *sb, unsigned long ino,
                brelse(bitmap_bh);
                if (i == start + inodes_per_block) {
                        /* all other inodes are free, so skip I/O */
-                       memset(bh->b_data, 0, bh->b_size);
-                       set_buffer_uptodate(bh);
                        unlock_buffer(bh);
                        goto has_buffer;
                }

@@ -5132,6 +5130,9 @@ static int ext4_do_update_inode(handle_t *handle,
        if (err)
                goto out_brelse;
        ext4_clear_inode_state(inode, EXT4_STATE_NEW);
+       if (!buffer_uptodate(bh))
+               set_buffer_uptodate(bh);
+
        if (set_large_file) {
                BUFFER_TRACE(EXT4_SB(sb)->s_sbh, "get write access");
                err = ext4_journal_get_write_access(handle, EXT4_SB(sb)->s_sbh);

@@ -5712,6 +5713,13 @@ int ext4_mark_iloc_dirty(handle_t *handle,
        /* the do_update_inode consumes one bh->b_count */
        get_bh(iloc->bh);

+       if (!buffer_uptodate(bh)) {
+               lock_buffer(iloc->bh);
+               if (!buffer_uptodate(iloc->bh))
+                       memset(iloc->bh->b_data, 0, iloc->bh->b_size);
+               unlock_buffer(iloc->bh);
+       }
+
        /* ext4_do_update_inode() does jbd2_journal_dirty_metadata */
        err = ext4_do_update_inode(handle, inode, iloc);
        put_bh(iloc->bh);

Thanks,
Yi.

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help