Re: [PATCH 2/2] NFSv4: Fix dropped lock for racing OPEN and delegation return

From: Trond Myklebust <hidden>
Date: 2023-06-29 18:33:35

On Tue, 2023-06-27 at 14:31 -0400, Benjamin Coddington wrote:

quoted hunk ↗ jump to hunk

Commmit f5ea16137a3f ("NFSv4: Retry LOCK on OLD_STATEID during
delegation
return") attempted to solve this problem by using nfs4's generic
async error
handling, but introduced a regression where v4.0 lock recovery would
hang.
The additional complexity introduced by overloading that error
handling is
not necessary for this case.

The problem as originally explained in the above commit is:

    There's a small window where a LOCK sent during a delegation
return can
    race with another OPEN on client, but the open stateid has not
yet been
    updated.  In this case, the client doesn't handle the OLD_STATEID
error
    from the server and will lose this lock, emitting:
    "NFS: nfs4_handle_delegation_recall_error: unhandled error -
10024".

We want a fix that is much more focused to the original problem.  Fix
this
issue by returning -EAGAIN from the
nfs4_handle_delegation_recall_error() on
OLD_STATEID, and use that as a signal for the delegation return code
to
retry the LOCK operation.  We should at this point be able to send
along
the updated stateid.

Signed-off-by: Benjamin Coddington <redacted>
---
 fs/nfs/delegation.c | 4 +++-
 fs/nfs/nfs4proc.c   | 1 +
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/nfs/delegation.c b/fs/nfs/delegation.c
index cf7365581031..23aeb02319a5 100644
--- a/fs/nfs/delegation.c
+++ b/fs/nfs/delegation.c

@@ -160,7 +160,9 @@ static int nfs_delegation_claim_locks(struct

nfs4_state *state, const nfs4_state
                if (nfs_file_open_context(fl->fl_file)->state !=
state)
                        continue;
                spin_unlock(&flctx->flc_lock);
-               status = nfs4_lock_delegation_recall(fl, state,
stateid);
+               do {
+                       status = nfs4_lock_delegation_recall(fl,
state, stateid);
+               } while (status == -EAGAIN);
                if (status < 0)
                        goto out;
                spin_lock(&flctx->flc_lock);

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 6bb14f6cfbc0..399db73a57f4 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c

@@ -2262,6 +2262,7 @@ static int

nfs4_handle_delegation_recall_error(struct nfs_server *server, struct
                case -NFS4ERR_BAD_HIGH_SLOT:
                case -NFS4ERR_CONN_NOT_BOUND_TO_SESSION:
                case -NFS4ERR_DEADSESSION:
+               case -NFS4ERR_OLD_STATEID:
                        return -EAGAIN;

Hmm... Rather than issuing a blanket EAGAIN, we really should be
looking at using either nfs4_refresh_lock_old_stateid() or
nfs4_refresh_open_old_stateid(), depending on whether the stateid that
saw the NFS4ERR_OLD_STATEID was a lock stateid or an open stateid.

                case -NFS4ERR_STALE_CLIENTID:
                case -NFS4ERR_STALE_STATEID:

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help