deadlock in RELEASE_LOCKOWNER path
From: Chuck Lever III <chuck.lever@oracle.com>
Date: 2024-02-04 23:06:23
Hi Neil-
I'm testing v6.8-rc3 + nfsd-next. This series includes:
nfsd: fix RELEASE_LOCKOWNER
and
nfsd: don't call locks_release_private() twice concurrently
======================================================
WARNING: possible circular locking dependency detected
6.8.0-rc3-00052-gc20ad5c2d60c #1 Not tainted
------------------------------------------------------
nfsd/1218 is trying to acquire lock:
ffff88814d754198 (&ctx->flc_lock){+.+.}-{2:2}, at: check_for_locks+0xf6/0x1d0 [nfsd]
but task is already holding lock:
ffff8881210e61f0 (&fp->fi_lock){+.+.}-{2:2}, at: check_for_locks+0x2d/0x1d0 [nfsd]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&fp->fi_lock){+.+.}-{2:2}:
_raw_spin_lock+0x2f/0x42
nfsd_break_deleg_cb+0x295/0x2f6 [nfsd]
__break_lease+0x38b/0x864
break_lease+0x87/0xc2
do_dentry_open+0x5df/0xc8d
vfs_open+0xbb/0xc4
dentry_open+0x5a/0x7a
__nfsd_open.isra.0+0x1ed/0x2a3 [nfsd]
nfsd_open_verified+0x16/0x1c [nfsd]
nfsd_file_do_acquire+0x149c/0x1650 [nfsd]
nfsd_file_acquire+0x16/0x1c [nfsd]
nfsd4_commit+0x72/0x10c [nfsd]
nfsd4_proc_compound+0x12c5/0x17a8 [nfsd]
nfsd_dispatch+0x32f/0x590 [nfsd]
svc_process_common+0xb64/0x13b8 [sunrpc]
svc_process+0x3b8/0x40c [sunrpc]
svc_recv+0x1478/0x1803 [sunrpc]
nfsd+0x2a1/0x2e3 [nfsd]
kthread+0x2cb/0x2da
ret_from_fork+0x2a/0x62
ret_from_fork_asm+0x1b/0x30
-> #0 (&ctx->flc_lock){+.+.}-{2:2}:
__lock_acquire+0x1e49/0x27f7
lock_acquire+0x25c/0x3df
_raw_spin_lock+0x2f/0x42
check_for_locks+0xf6/0x1d0 [nfsd]
nfsd4_release_lockowner+0x2b9/0x3a4 [nfsd]
nfsd4_proc_compound+0x12c5/0x17a8 [nfsd]
nfsd_dispatch+0x32f/0x590 [nfsd]
svc_process_common+0xb64/0x13b8 [sunrpc]
svc_process+0x3b8/0x40c [sunrpc]
svc_recv+0x1478/0x1803 [sunrpc]
nfsd+0x2a1/0x2e3 [nfsd]
kthread+0x2cb/0x2da
ret_from_fork+0x2a/0x62
ret_from_fork_asm+0x1b/0x30
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&fp->fi_lock);
lock(&ctx->flc_lock);
lock(&fp->fi_lock);
lock(&ctx->flc_lock);
*** DEADLOCK ***
2 locks held by nfsd/1218:
#0: ffff88823a3ccf90 (&clp->cl_lock#2){+.+.}-{2:2}, at: nfsd4_release_lockowner+0x22d/0x3a4 [nfsd]
#1: ffff8881210e61f0 (&fp->fi_lock){+.+.}-{2:2}, at: check_for_locks+0x2d/0x1d0 [nfsd]
stack backtrace:
CPU: 2 PID: 1218 Comm: nfsd Not tainted 6.8.0-rc3-00052-gc20ad5c2d60c #1
Hardware name: Supermicro Super Server/X10SRL-F, BIOS 3.3 10/28/2020
Call Trace:
<TASK>
dump_stack_lvl+0x70/0xa4
dump_stack+0x10/0x16
print_circular_bug+0x37c/0x38f
check_noncircular+0x16d/0x19a
? __pfx_mark_lock+0x10/0x10
? __pfx_check_noncircular+0x10/0x10
? add_chain_block+0x142/0x19c
__lock_acquire+0x1e49/0x27f7
? __pfx___lock_acquire+0x10/0x10
? check_for_locks+0xf6/0x1d0 [nfsd]
lock_acquire+0x25c/0x3df
? check_for_locks+0xf6/0x1d0 [nfsd]
? __pfx_lock_acquire+0x10/0x10
? __kasan_check_write+0x14/0x1a
? do_raw_spin_lock+0x146/0x1ea
_raw_spin_lock+0x2f/0x42
? check_for_locks+0xf6/0x1d0 [nfsd]
check_for_locks+0xf6/0x1d0 [nfsd]
nfsd4_release_lockowner+0x2b9/0x3a4 [nfsd]
? __pfx_nfsd4_release_lockowner+0x10/0x10 [nfsd]
nfsd4_proc_compound+0x12c5/0x17a8 [nfsd]
? nfsd_read_splice_ok+0xe/0x1f [nfsd]
nfsd_dispatch+0x32f/0x590 [nfsd]
? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
? svc_reserve+0xed/0xfc [sunrpc]
svc_process_common+0xb64/0x13b8 [sunrpc]
? __pfx_svc_process_common+0x10/0x10 [sunrpc]
? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
? xdr_inline_decode+0x92/0x1cb [sunrpc]
svc_process+0x3b8/0x40c [sunrpc]
svc_recv+0x1478/0x1803 [sunrpc]
nfsd+0x2a1/0x2e3 [nfsd]
? __kthread_parkme+0xcc/0x19c
? __pfx_nfsd+0x10/0x10 [nfsd]
kthread+0x2cb/0x2da
? __pfx_kthread+0x10/0x10
ret_from_fork+0x2a/0x62
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1b/0x30
</TASK>
I get a very similar splat while testing v5.15.y and v5.10.y
with "nfsd: fix RELEASE_LOCKOWNER" applied. I'm circling back
to v6.1.y as well, but I expect I will see the same there.
--
Chuck Lever