Re: [PATCH 1/1] NFSD: fix WARN_ON_ONCE in __queue_delayed_work
From: dai.ngo@oracle.com
Date: 2023-01-10 18:53:42
On 1/10/23 10:17 AM, Chuck Lever III wrote:
quoted
On Jan 10, 2023, at 12:33 PM, Dai Ngo [off-list ref] wrote: On 1/10/23 2:30 AM, Jeff Layton wrote:quoted
On Mon, 2023-01-09 at 22:48 -0800, Dai Ngo wrote:quoted
Currently nfsd4_state_shrinker_worker can be schduled multiple times from nfsd4_state_shrinker_count when memory is low. This causes the WARN_ON_ONCE in __queue_delayed_work to trigger. This patch allows only one instance of nfsd4_state_shrinker_worker at a time using the nfsd_shrinker_active flag, protected by the client_lock. Replace mod_delayed_work with queue_delayed_work since we don't expect to modify the delay of any pending work. Fixes: 44df6f439a17 ("NFSD: add delegation reaper to react to low memory condition") Reported-by: Mike Galbraith <redacted> Signed-off-by: Dai Ngo <dai.ngo@oracle.com> --- fs/nfsd/netns.h | 1 + fs/nfsd/nfs4state.c | 16 ++++++++++++++-- 2 files changed, 15 insertions(+), 2 deletions(-)diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h index 8c854ba3285b..801d70926442 100644 --- a/fs/nfsd/netns.h +++ b/fs/nfsd/netns.h@@ -196,6 +196,7 @@ struct nfsd_net { atomic_t nfsd_courtesy_clients; struct shrinker nfsd_client_shrinker; struct delayed_work nfsd_shrinker_work; + bool nfsd_shrinker_active; }; /* Simple check to find out if a given net was properly initialized */diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index ee56c9466304..e00551af6a11 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c@@ -4407,11 +4407,20 @@ nfsd4_state_shrinker_count(struct shrinker *shrink, struct shrink_control *sc) struct nfsd_net *nn = container_of(shrink, struct nfsd_net, nfsd_client_shrinker); + spin_lock(&nn->client_lock); + if (nn->nfsd_shrinker_active) { + spin_unlock(&nn->client_lock); + return 0; + }Is this extra machinery really necessary? The bool and spinlock don't seem to be needed. Typically there is no issue with calling queued_delayed_work when the work is already queued. It just returns false in that case without doing anything.When there are multiple calls to mod_delayed_work/queue_delayed_work we hit the WARN_ON_ONCE's in __queue_delayed_work and __queue_work if the work is queued but not execute yet.The delay argument of zero is interesting. If it's set to a value greater than zero, do you still see a problem?
I tried and tried but could not reproduce the problem that Mike reported. I guess my VMs don't have fast enough cpus to make it happen. As Jeff mentioned, delay 0 should be safe and we want to run the shrinker as soon as possible when memory is low. -Dai
quoted
This problem was reported by Mike. I initially tried with only the bool but that was not enough that was why the spinlock was added. Mike verified that the patch fixed the problem. -Daiquoted
quoted
count = atomic_read(&nn->nfsd_courtesy_clients); if (!count) count = atomic_long_read(&num_delegations); - if (count) - mod_delayed_work(laundry_wq, &nn->nfsd_shrinker_work, 0); + if (count) { + nn->nfsd_shrinker_active = true; + spin_unlock(&nn->client_lock); + queue_delayed_work(laundry_wq, &nn->nfsd_shrinker_work, 0); + } else + spin_unlock(&nn->client_lock); return (unsigned long)count; } @@ -6239,6 +6248,9 @@ nfsd4_state_shrinker_worker(struct work_struct *work) courtesy_client_reaper(nn); deleg_reaper(nn); + spin_lock(&nn->client_lock); + nn->nfsd_shrinker_active = 0; + spin_unlock(&nn->client_lock); } static inline __be32 nfs4_check_fh(struct svc_fh *fhp, struct nfs4_stid *stp)-- Chuck Lever