Thread (24 messages) 24 messages, 4 authors, 2022-01-20

Re: [PATCH v2 2/2] loop: use task_work for autoclear operation

From: Jan Kara <jack@suse.cz>
Date: 2022-01-13 10:44:34

On Wed 12-01-22 14:16:15, Jan Kara wrote:
On Tue 11-01-22 00:08:56, Tetsuo Handa wrote:
quoted
On 2022/01/10 22:42, Jan Kara wrote:
quoted
a) We didn't fully establish a real deadlock scenario from the lockdep
report, did we? The lockdep report involved suspend locks, some locks on
accessing files in /proc etc. and it was not clear whether it all reflects
a real deadlock possibility or just a fact that lockdep tracking is rather
coarse-grained at times. Now lockdep report is unpleasant and loop device
locking was ugly anyway so your async change made sense but once lockdep is
silenced we should really establish whether there is real deadlock and more
work is needed or not.
Not /proc files but /sys/power/resume file.
Here is a reproducer but I can't test whether we can trigger a real deadlock.

----------------------------------------
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/ioctl.h>
#include <linux/loop.h>
#include <sys/sendfile.h>

int main(int argc, char *argv[])
{
	const int file_fd = open("testfile", O_RDWR | O_CREAT, 0600);
	ftruncate(file_fd, 1048576);
	char filename[128] = { };
	const int loop_num = ioctl(open("/dev/loop-control", 3), LOOP_CTL_GET_FREE, 0);
	snprintf(filename, sizeof(filename) - 1, "/dev/loop%d", loop_num);
	const int loop_fd_1 = open(filename, O_RDWR);
	ioctl(loop_fd_1, LOOP_SET_FD, file_fd);
	const int loop_fd_2 = open(filename, O_RDWR);
	ioctl(loop_fd_1, LOOP_CLR_FD, 0);
	const int sysfs_fd = open("/sys/power/resume", O_RDWR);
	sendfile(file_fd, sysfs_fd, 0, 1048576);
	sendfile(loop_fd_2, file_fd, 0, 1048576);
	write(sysfs_fd, "700", 3);
	close(loop_fd_2);
	close(loop_fd_1); // Lockdep complains.
	close(file_fd);
	return 0;
}
----------------------------------------
Thanks for the reproducer. I will try to simplify it even more and figure
out whether there is a real deadlock potential in the lockdep complaint or
not...
OK, so I think I understand the lockdep complaint better. Lockdep
essentially complains about the following scenario:

blkdev_put()
  lock disk->open_mutex
  lo_release
    __loop_clr_fd()
        |
        | wait for IO to complete
        v
loop worker
  write to backing file
    sb_start_write(file)
        |
        | wait for fs with backing file to unfreeze
        v
process holding fs frozen
  freeze_super()
        |
        | wait for remaining writers on the fs so that fs can be frozen
        v
sendfile()
  sb_start_write()
  do_splice_direct()
        |
        | blocked on read from /sys/power/resume, waiting for kernfs file
	| lock
        v
write() "/dev/loop0" (in whatever form) to /sys/power/resume
  calls blkdev_get_by_dev("/dev/loop0")
    lock disk->open_mutex => deadlock

So there are three other ingredients to this locking loop besides loop device
behavior:
1) sysfs files which serialize reads & writes
2) sendfile which holds freeze protection while reading data to write
3) "resume" file behavior opening bdev from write to sysfs file

We cannot sensibly do anything about 1) AFAICT. You cannot get a coherent
view of a sysfs file while it is changing.

We could actually change 2) (to only hold freeze protection while splicing
from pipe) but that will fix the problem only for sendfile(2). E.g.
splice(2) may also block waiting for data to become available in the pipe
while holding freeze protection. Changing that would require some surgery
in our splice infrastructure and at this point I'm not sure whether we
would not introduce other locking problems due to pipe_lock lock ordering.

For 3), we could consider stuff like not resuming from a loop device or
postponing resume to a workqueue but it all looks ugly.

Maybe the most disputable thing in this locking chain seems to be splicing
from sysfs files. That does not seem terribly useful and due to special
locking and behavior of sysfs files it allows for creating interesting lock
dependencies. OTOH maybe there is someone out there who (possibly
inadvertedly through some library) ends up using splice on sysfs files so
chances for userspace breakage, if we disable splice for sysfs, would be
non-negligible. Hum, tough.

								Honza

-- 
Jan Kara [off-list ref]
SUSE Labs, CR
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help