Re: possible deadlock in start_this_handle (2)

From: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Date: 2021-02-19 10:18:06
Also in: linux-mm, lkml

On 2021/02/15 23:29, Jan Kara wrote:

On Mon 15-02-21 23:06:15, Tetsuo Handa wrote:

quoted

On 2021/02/15 21:45, Jan Kara wrote:

quoted

On Sat 13-02-21 23:26:37, Tetsuo Handa wrote:

quoted

Excuse me, but it seems to me that nothing prevents
ext4_xattr_set_handle() from reaching ext4_xattr_inode_lookup_create()
without memalloc_nofs_save() when hitting ext4_get_nojournal() path.
Will you explain when ext4_get_nojournal() path is executed?

That's a good question but sadly I don't think that's it.
ext4_get_nojournal() is called when the filesystem is created without a
journal. In that case we also don't acquire jbd2_handle lockdep map. In the
syzbot report we can see:

Since syzbot can test filesystem images, syzbot might have tested a filesystem
image created both with and without journal within this boot.

a) I think that syzbot reboots the VM between executing different tests to
get reproducible conditions. But in theory I agree the test may have
contained one image with and one image without a journal.

syzkaller reboots the VM upon a crash.
syzkaller can test multiple filesystem images within one boot.

https://storage.googleapis.com/syzkaller/cover/ci-qemu-upstream-386.html (this
statistic snapshot is volatile) reports that ext4_get_nojournal() is partially covered
( https://github.com/google/syzkaller/blob/master/docs/coverage.md ) by syzkaller.

      /* Just increment the non-pointer handle value */
      static handle_t *ext4_get_nojournal(void)
      {
   86         handle_t *handle = current->journal_info;
              unsigned long ref_cnt = (unsigned long)handle;
      
              BUG_ON(ref_cnt >= EXT4_NOJOURNAL_MAX_REF_COUNT);
      
   86         ref_cnt++;
              handle = (handle_t *)ref_cnt;
      
              current->journal_info = handle;
 2006         return handle;
      }
      
      
      /* Decrement the non-pointer handle value */
      static void ext4_put_nojournal(handle_t *handle)
      {
              unsigned long ref_cnt = (unsigned long)handle;
      
   85         BUG_ON(ref_cnt == 0);
      
   85         ref_cnt--;
              handle = (handle_t *)ref_cnt;
      
              current->journal_info = handle;
      }


      handle_t *__ext4_journal_start_sb(struct super_block *sb, unsigned int line,
                                        int type, int blocks, int rsv_blocks,
                                        int revoke_creds)
      {
              journal_t *journal;
              int err;
      
 2006         trace_ext4_journal_start(sb, blocks, rsv_blocks, revoke_creds,
 2006                                  _RET_IP_);
 2006         err = ext4_journal_check_start(sb);
              if (err < 0)
                      return ERR_PTR(err);
      
 2005         journal = EXT4_SB(sb)->s_journal;
 1969         if (!journal || (EXT4_SB(sb)->s_mount_state & EXT4_FC_REPLAY))
 2006                 return ext4_get_nojournal();
 1969         return jbd2__journal_start(journal, blocks, rsv_blocks, revoke_creds,
                                         GFP_NOFS, type, line);
      }

*but*

b) as I wrote in the email you are replying to, the jbd2_handle key is
private per filesystem. Thus for lockdep to complain about
jbd2_handle->fs_reclaim->jbd2_handle deadlock, those jbd2_handle lockdep
maps must come from the same filesystem.

*and*

c) filesystem without journal doesn't use jbd2_handle lockdep map at all so
for such filesystems lockdep creates no dependency for jbd2_handle map.

What about "EXT4_SB(sb)->s_mount_state & EXT4_FC_REPLAY)" case?
Does this case happen on filesystem with journal, and can this case be executed
by fuzzing a crafted (a sort of erroneous) filesystem with journal, and are
the jbd2_handle for calling ext4_get_nojournal() case and the jbd2_handle for
calling jbd2__journal_start() case the same?

Also, I worry that jbd2__journal_restart() is problematic, for it calls
stop_this_handle() (which calls memalloc_nofs_restore()) and then calls
start_this_handle() (which fails to call memalloc_nofs_save() if an error
occurs). An error from start_this_handle() from journal restart operation
might need special handling (i.e. either remount-ro or panic) ?

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help