Re: possible deadlock in start_this_handle (2)
From: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Date: 2021-02-19 10:18:06
Also in:
linux-mm, lkml
On 2021/02/15 23:29, Jan Kara wrote:
On Mon 15-02-21 23:06:15, Tetsuo Handa wrote:quoted
On 2021/02/15 21:45, Jan Kara wrote:quoted
On Sat 13-02-21 23:26:37, Tetsuo Handa wrote:quoted
Excuse me, but it seems to me that nothing prevents ext4_xattr_set_handle() from reaching ext4_xattr_inode_lookup_create() without memalloc_nofs_save() when hitting ext4_get_nojournal() path. Will you explain when ext4_get_nojournal() path is executed?That's a good question but sadly I don't think that's it. ext4_get_nojournal() is called when the filesystem is created without a journal. In that case we also don't acquire jbd2_handle lockdep map. In the syzbot report we can see:Since syzbot can test filesystem images, syzbot might have tested a filesystem image created both with and without journal within this boot.a) I think that syzbot reboots the VM between executing different tests to get reproducible conditions. But in theory I agree the test may have contained one image with and one image without a journal.
syzkaller reboots the VM upon a crash. syzkaller can test multiple filesystem images within one boot. https://storage.googleapis.com/syzkaller/cover/ci-qemu-upstream-386.html (this statistic snapshot is volatile) reports that ext4_get_nojournal() is partially covered ( https://github.com/google/syzkaller/blob/master/docs/coverage.md ) by syzkaller. /* Just increment the non-pointer handle value */ static handle_t *ext4_get_nojournal(void) { 86 handle_t *handle = current->journal_info; unsigned long ref_cnt = (unsigned long)handle; BUG_ON(ref_cnt >= EXT4_NOJOURNAL_MAX_REF_COUNT); 86 ref_cnt++; handle = (handle_t *)ref_cnt; current->journal_info = handle; 2006 return handle; } /* Decrement the non-pointer handle value */ static void ext4_put_nojournal(handle_t *handle) { unsigned long ref_cnt = (unsigned long)handle; 85 BUG_ON(ref_cnt == 0); 85 ref_cnt--; handle = (handle_t *)ref_cnt; current->journal_info = handle; } handle_t *__ext4_journal_start_sb(struct super_block *sb, unsigned int line, int type, int blocks, int rsv_blocks, int revoke_creds) { journal_t *journal; int err; 2006 trace_ext4_journal_start(sb, blocks, rsv_blocks, revoke_creds, 2006 _RET_IP_); 2006 err = ext4_journal_check_start(sb); if (err < 0) return ERR_PTR(err); 2005 journal = EXT4_SB(sb)->s_journal; 1969 if (!journal || (EXT4_SB(sb)->s_mount_state & EXT4_FC_REPLAY)) 2006 return ext4_get_nojournal(); 1969 return jbd2__journal_start(journal, blocks, rsv_blocks, revoke_creds, GFP_NOFS, type, line); }
*but* b) as I wrote in the email you are replying to, the jbd2_handle key is private per filesystem. Thus for lockdep to complain about jbd2_handle->fs_reclaim->jbd2_handle deadlock, those jbd2_handle lockdep maps must come from the same filesystem. *and* c) filesystem without journal doesn't use jbd2_handle lockdep map at all so for such filesystems lockdep creates no dependency for jbd2_handle map.
What about "EXT4_SB(sb)->s_mount_state & EXT4_FC_REPLAY)" case? Does this case happen on filesystem with journal, and can this case be executed by fuzzing a crafted (a sort of erroneous) filesystem with journal, and are the jbd2_handle for calling ext4_get_nojournal() case and the jbd2_handle for calling jbd2__journal_start() case the same? Also, I worry that jbd2__journal_restart() is problematic, for it calls stop_this_handle() (which calls memalloc_nofs_restore()) and then calls start_this_handle() (which fails to call memalloc_nofs_save() if an error occurs). An error from start_this_handle() from journal restart operation might need special handling (i.e. either remount-ro or panic) ?