Re: [PATCH 1/1] NFSD: fix WARN_ON_ONCE in __queue_delayed_work

From: Jeff Layton <jlayton@kernel.org>
Date: 2023-01-10 19:16:59
Subsystem: filesystems (vfs and infrastructure), kernel nfsd, sunrpc, and lockd servers, the rest · Maintainers: Alexander Viro, Christian Brauner, Chuck Lever, Jeff Layton, Linus Torvalds

On Tue, 2023-01-10 at 18:53 +0000, Chuck Lever III wrote:

quoted

On Jan 10, 2023, at 1:46 PM, Dai Ngo [off-list ref] wrote:


On 1/10/23 10:17 AM, Chuck Lever III wrote:

quoted

On Jan 10, 2023, at 12:33 PM, Dai Ngo [off-list ref] wrote:


On 1/10/23 2:30 AM, Jeff Layton wrote:

quoted

On Mon, 2023-01-09 at 22:48 -0800, Dai Ngo wrote:

quoted

Currently nfsd4_state_shrinker_worker can be schduled multiple times
from nfsd4_state_shrinker_count when memory is low. This causes
the WARN_ON_ONCE in __queue_delayed_work to trigger.

This patch allows only one instance of nfsd4_state_shrinker_worker
at a time using the nfsd_shrinker_active flag, protected by the
client_lock.

Replace mod_delayed_work with queue_delayed_work since we
don't expect to modify the delay of any pending work.

Fixes: 44df6f439a17 ("NFSD: add delegation reaper to react to low memory condition")
Reported-by: Mike Galbraith <redacted>
Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
---
 fs/nfsd/netns.h     |  1 +
 fs/nfsd/nfs4state.c | 16 ++++++++++++++--
 2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h
index 8c854ba3285b..801d70926442 100644
--- a/fs/nfsd/netns.h
+++ b/fs/nfsd/netns.h

@@ -196,6 +196,7 @@ struct nfsd_net {
 	atomic_t		nfsd_courtesy_clients;
 	struct shrinker		nfsd_client_shrinker;
 	struct delayed_work	nfsd_shrinker_work;
+	bool			nfsd_shrinker_active;
 };
   /* Simple check to find out if a given net was properly initialized */

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index ee56c9466304..e00551af6a11 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c

@@ -4407,11 +4407,20 @@ nfsd4_state_shrinker_count(struct shrinker *shrink, struct shrink_control *sc)
 	struct nfsd_net *nn = container_of(shrink,
 			struct nfsd_net, nfsd_client_shrinker);
 +	spin_lock(&nn->client_lock);
+	if (nn->nfsd_shrinker_active) {
+		spin_unlock(&nn->client_lock);
+		return 0;
+	}

Is this extra machinery really necessary? The bool and spinlock don't
seem to be needed. Typically there is no issue with calling
queued_delayed_work when the work is already queued. It just returns
false in that case without doing anything.

When there are multiple calls to mod_delayed_work/queue_delayed_work
we hit the WARN_ON_ONCE's in __queue_delayed_work and __queue_work if
the work is queued but not execute yet.

The delay argument of zero is interesting. If it's set to a value
greater than zero, do you still see a problem?

I tried and tried but could not reproduce the problem that Mike
reported. I guess my VMs don't have fast enough cpus to make it
happen.

I'd prefer not to guess... it sounds like we don't have a clear
root cause on this one yet.

I think I agree with Jeff: a spinlock shouldn't be required to
make queuing work safe via this API.

quoted

As Jeff mentioned, delay 0 should be safe and we want to run
the shrinker as soon as possible when memory is low.

I suggested that because the !delay code paths seem to lead
directly to the WARN_ONs in queue_work(). <shrug>


One of the WARNs in that Mike hit was this:

        WARN_ON_ONCE(timer->function != delayed_work_timer_fn);

nfsd isn't doing anything exotic with that function pointer, so that
really looks like something got corrupted. Given that this is happening
under low-memory conditions, then I have to wonder if we're just ending
up with a workqueue job that remained on the queue after the nfsd_net
got freed and recycled.

I'd start with a patch like this (note, untested):

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 2f4a2449b314..86da6663806e 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c

@@ -8158,6 +8158,7 @@ nfs4_state_shutdown_net(struct net *net)
        struct nfsd_net *nn = net_generic(net, nfsd_net_id);
 
        unregister_shrinker(&nn->nfsd_client_shrinker);
+       cancel_delayed_work_sync(&nn->nfsd_shrinker_work);
        cancel_delayed_work_sync(&nn->laundromat_work);
        locks_end_grace(&nn->nfsd4_manager);

Either way, I think longer nfsd_shrinker_work ought to be converted to a
normal work_struct since you don't ever use the delay.

quoted

-Dai

quoted

This problem was reported by Mike. I initially tried with only the
bool but that was not enough that was why the spinlock was added.
Mike verified that the patch fixed the problem.

-Dai

quoted

 	count = atomic_read(&nn->nfsd_courtesy_clients);
 	if (!count)
 		count = atomic_long_read(&num_delegations);
-	if (count)
-		mod_delayed_work(laundry_wq, &nn->nfsd_shrinker_work, 0);
+	if (count) {
+		nn->nfsd_shrinker_active = true;
+		spin_unlock(&nn->client_lock);
+		queue_delayed_work(laundry_wq, &nn->nfsd_shrinker_work, 0);
+	} else
+		spin_unlock(&nn->client_lock);
 	return (unsigned long)count;
 }
 @@ -6239,6 +6248,9 @@ nfsd4_state_shrinker_worker(struct work_struct *work)
   	courtesy_client_reaper(nn);
 	deleg_reaper(nn);
+	spin_lock(&nn->client_lock);
+	nn->nfsd_shrinker_active = 0;
+	spin_unlock(&nn->client_lock);
 }
   static inline __be32 nfs4_check_fh(struct svc_fh *fhp, struct nfs4_stid *stp)

--
Chuck Lever

--
Chuck Lever

-- 
Jeff Layton [off-list ref]

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help