Re: [PATCH] nfsd: fix potential race in nfs4_find_file
From: Chuck Lever III <hidden>
Date: 2023-01-05 21:37:04
On Jan 5, 2023, at 3:43 PM, Jeff Layton [off-list ref] wrote: On Thu, 2023-01-05 at 14:46 +0000, Chuck Lever III wrote:quoted
quoted
On Jan 5, 2023, at 7:18 AM, Jeff Layton [off-list ref] wrote: Even though there is a WARN_ON_ONCE check, it seems possible for nfs4_find_file to race with the destruction of an fi_deleg_file while trying to take a reference to it. put_deleg_file is done while holding the fi_lock. Take and hold it when dealing with the fi_deleg_file in nfs4_find_file. Signed-off-by: Jeff Layton <jlayton@kernel.org> --- fs/nfsd/nfs4state.c | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-)diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index b68238024e49..3df3ae84bd07 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c@@ -6417,23 +6417,27 @@ nfsd4_lookup_stateid(struct nfsd4_compound_state *cstate,static struct nfsd_file * nfs4_find_file(struct nfs4_stid *s, int flags) { + struct nfsd_file *ret = NULL; + if (!s) return NULL; switch (s->sc_type) { case NFS4_DELEG_STID: - if (WARN_ON_ONCE(!s->sc_file->fi_deleg_file)) - return NULL; - return nfsd_file_get(s->sc_file->fi_deleg_file); + spin_lock(&s->sc_file->fi_lock); + if (!WARN_ON_ONCE(!s->sc_file->fi_deleg_file))You'd think this would be a really really hard race to hit. What I'm wondering, though, is whether the WARN_ON_ONCE should be dropped by this patch. I've never seen it fire.I have: https://bugzilla.redhat.com/show_bug.cgi?id=1997177
It's possible though that those WARNs are fallout from other bugs in the delegation handling, but it's hard to know for sure.
Before 2015 there were a bunch of BUG_ON's in this code that were converted to WARN after Linus complained. Before that, I think these were all debugging sentinels. (in which case I would argue they might be better recast as tracepoints, but that's for another day).
I think we ought to keep it there for now.
The question is whether the WARN_ON is adding value for customers. Can they do something about it? If they give us this information, can we do something about it? I can't tell from the warning whether the problem is due to a server bug or valid client behavior. Both the server and the client workload appear to survive. So, I just don't feel like it's adding value, and firing a WARN while holding a spinlock makes me squidgy.
quoted
quoted
+ ret = nfsd_file_get(s->sc_file->fi_deleg_file); + spin_unlock(&s->sc_file->fi_lock); + break; case NFS4_OPEN_STID: case NFS4_LOCK_STID: if (flags & RD_STATE) - return find_readable_file(s->sc_file); + ret = find_readable_file(s->sc_file); else - return find_writeable_file(s->sc_file); + ret = find_writeable_file(s->sc_file); } - return NULL; + return ret; } static __be32 -- 2.39.0-- Chuck Lever-- Jeff Layton [off-list ref]
-- Chuck Lever