Re: [syzbot] possible deadlock in del_gendisk
From: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Date: 2021-06-11 15:49:35
On 2021/06/12 0:18, Pavel Tatashin wrote:
quoted
quoted
Well, I made commit 310ca162d779efee ("block/loop: Use global lock for ioctl() operation.") because per device lock was not sufficient. Did commit 6cc8e7430801fa23 ("loop: scale loop device by introducing per device lock") take this problem into account?This was my intention when I wrote 6cc8e7430801fa23 ("loop: scale loop device by introducing per device lock"). This is why this change does not simply revert 310ca162d779efee ("block/loop: Use global lock for ioctl() operation."), but keeps loop_ctl_mutex to protect the global accesses. loop_control_ioctl() is still locked by global loop_ctl_mutex.
No, loop_control_ioctl() (i.e. /dev/loop-control) is irrelevant here.
What 310ca162d779efee addressed but (I worry) 6cc8e7430801fa23 broke is
lo_ioctl() (i.e. /dev/loop$num).
syzbot was reporting NULL pointer dereference which is caused by
race condition between ioctl(loop_fd, LOOP_CLR_FD, 0) versus
ioctl(other_loop_fd, LOOP_SET_FD, loop_fd) due to traversing other
loop devices at loop_validate_file() without holding corresponding
lo->lo_mutex lock.
For example, loop_change_fd("/dev/loop0") calls loop_validate_file()
with only "/dev/loop0"->lo_mutex held. Then, loop_validate_file() finds
that is_loop_device("/dev/loop0") == true and enters the "while" loop.
In the "while" loop, there is
if (l->lo_state != Lo_bound) {
return -EINVAL;
}
f = l->lo_backing_file;
which has a race window that l->lo_backing_file suddenly becomes NULL
between these statements because __loop_clr_fd("/dev/loop1") is doing
lo->lo_backing_file = NULL;
with only "/dev/loop1"->lo_mutex held.
In other words, loop_validate_file() is a global accesses which are
no longer protected by loop_ctl_mutex, isn't it?