Re: [5.15-rc1 regression] io_uring: fsstress hangs in do_coredump() on exit

From: Dave Chinner <david@fromorbit.com>
Date: 2021-09-21 21:36:15
Also in: linux-fsdevel

On Tue, Sep 21, 2021 at 08:19:53AM -0600, Jens Axboe wrote:

On 9/21/21 7:25 AM, Jens Axboe wrote:

quoted

On 9/21/21 12:40 AM, Dave Chinner wrote:

quoted

Hi Jens,

I updated all my trees from 5.14 to 5.15-rc2 this morning and
immediately had problems running the recoveryloop fstest group on
them. These tests have a typical pattern of "run load in the
background, shutdown the filesystem, kill load, unmount and test
recovery".

Whent eh load includes fsstress, and it gets killed after shutdown,
it hangs on exit like so:

# echo w > /proc/sysrq-trigger 
[  370.669482] sysrq: Show Blocked State
[  370.671732] task:fsstress        state:D stack:11088 pid: 9619 ppid:  9615 flags:0x00000000
[  370.675870] Call Trace:
[  370.677067]  __schedule+0x310/0x9f0
[  370.678564]  schedule+0x67/0xe0
[  370.679545]  schedule_timeout+0x114/0x160
[  370.682002]  __wait_for_common+0xc0/0x160
[  370.684274]  wait_for_completion+0x24/0x30
[  370.685471]  do_coredump+0x202/0x1150
[  370.690270]  get_signal+0x4c2/0x900
[  370.691305]  arch_do_signal_or_restart+0x106/0x7a0
[  370.693888]  exit_to_user_mode_prepare+0xfb/0x1d0
[  370.695241]  syscall_exit_to_user_mode+0x17/0x40
[  370.696572]  do_syscall_64+0x42/0x80
[  370.697620]  entry_SYSCALL_64_after_hwframe+0x44/0xae

It's 100% reproducable on one of my test machines, but only one of
them. That one machine is running fstests on pmem, so it has
synchronous storage. Every other test machine using normal async
storage (nvme, iscsi, etc) and none of them are hanging.

A quick troll of the commit history between 5.14 and 5.15-rc2
indicates a couple of potential candidates. The 5th kernel build
(instead of ~16 for a bisect) told me that commit 15e20db2e0ce
("io-wq: only exit on fatal signals") is the cause of the
regression. I've confirmed that this is the first commit where the
problem shows up.

Thanks for the report Dave, I'll take a look. Can you elaborate on
exactly what is being run? And when killed, it's a non-fatal signal?

It's whatever kill/killall sends by default.  Typical behaviour that
causes a hang is something like:

$FSSTRESS_PROG -n10000000 -p $PROCS -d $load_dir >> $seqres.full 2>&1 &
....
sleep 5
_scratch_shutdown
$KILLALL_PROG -q $FSSTRESS_PROG
wait

_shutdown_scratch is typically just an 'xfs_io -rx -c "shutdown"
/mnt/scratch' command that shuts down the filesystem. Other tests in
the recoveryloop group use DM targets to fail IO that trigger a
shutdown, others inject errors that trigger shutdowns, etc. But the
result is that all hang waiting for fsstress processes that have
been using io_uring to exit.

Just run fstests with "./check -g recoveryloop" - there's only a
handful of tests and it only takes about 5 minutes to run them all
on a fake DRAM based pmem device..

quoted hunk ↗ jump to hunk

Can you try with this patch?

diff --git a/fs/io-wq.c b/fs/io-wq.c
index b5fd015268d7..1e55a0a2a217 100644
--- a/fs/io-wq.c
+++ b/fs/io-wq.c

@@ -586,7 +586,8 @@ static int io_wqe_worker(void *data)
 
 			if (!get_signal(&ksig))
 				continue;
-			if (fatal_signal_pending(current))
+			if (fatal_signal_pending(current) ||
+			    signal_group_exit(current->signal)) {
 				break;
 			continue;
 		}

Cleaned up so it compiles and the tests run properly again. But
playing whack-a-mole with signals seems kinda fragile. I was pointed
to this patchset by another dev on #xfs overnight who saw the same
hangs that also fixed the hang:

https://lore.kernel.org/lkml/cover.1629655338.git.olivier@trillion01.com/ (local)

It was posted about a month ago and I don't see any response to it
on the lists...

Cheers,

Dave,
-- 
Dave Chinner
david@fromorbit.com

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help