Thread (25 messages) 25 messages, 7 authors, 2021-06-15

Re: [syzbot] possible deadlock in del_gendisk

From: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Date: 2021-06-13 11:02:11

On 2021/06/12 11:35, Tetsuo Handa wrote:
On 2021/06/12 0:49, Tetsuo Handa wrote:
quoted
On 2021/06/12 0:18, Pavel Tatashin wrote:
quoted
quoted
quoted
Well, I made commit 310ca162d779efee ("block/loop: Use global lock for ioctl() operation.")
because per device lock was not sufficient. Did commit 6cc8e7430801fa23 ("loop: scale loop
device by introducing per device lock") take this problem into account?
This was my intention when I wrote 6cc8e7430801fa23 ("loop: scale loop
device by introducing per device lock"). This is why this change does
not simply revert 310ca162d779efee ("block/loop: Use global lock for
ioctl() operation."), but keeps loop_ctl_mutex to protect the global
accesses.  loop_control_ioctl() is still locked by global
loop_ctl_mutex.
No, loop_control_ioctl() (i.e. /dev/loop-control) is irrelevant here.
What 310ca162d779efee addressed but (I worry) 6cc8e7430801fa23 broke is
lo_ioctl() (i.e. /dev/loop$num).

syzbot was reporting NULL pointer dereference which is caused by
race condition between ioctl(loop_fd, LOOP_CLR_FD, 0) versus
ioctl(other_loop_fd, LOOP_SET_FD, loop_fd) due to traversing other
loop devices at loop_validate_file() without holding corresponding
lo->lo_mutex lock.
Here is a reproducer and a race window widening patch.
Since loop_validate_file() traverses on other loop devices,
changing fd of loop device needs to be protected using a global lock.
At least LOOP_SET_FD, LOOP_CONFIGURE, LOOP_CHANGE_FD, LOOP_CLR_FD needs to be
serialized using a global lock because loop_validate_file() traverses on other
loop devices which can cause NULL pointer dereference like syzbot has reported
in the past.

And I think LOOP_SET_CAPACITY, LOOP_SET_DIRECT_IO, LOOP_SET_BLOCK_SIZE also
needs to be serialized using a global lock because loop_change_fd() checks
capacity and dio state of other loop device which is not serialized.

I'd like to apply

  [PATCH] loop: drop loop_ctl_mutex around del_gendisk() in loop_remove()

as soon as possible because it is current top crasher (crashing on every 39 seconds).

I suggest simply reverting commit 6cc8e7430801fa23 ("loop: scale loop device by
introducing per device lock") for now. Do you want to revert 6cc8e7430801fa23
before my patch? If yes, I'll update my patch. If no, I'll just ask Jens to send
my patch to Linus.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help