Re: [PATCH 1/1] NFSD: fix WARN_ON_ONCE in __queue_delayed_work
From: dai.ngo@oracle.com
Date: 2023-01-10 19:09:21
On 1/10/23 10:53 AM, Chuck Lever III wrote:
quoted
On Jan 10, 2023, at 1:46 PM, Dai Ngo [off-list ref] wrote: On 1/10/23 10:17 AM, Chuck Lever III wrote:quoted
quoted
On Jan 10, 2023, at 12:33 PM, Dai Ngo [off-list ref] wrote: On 1/10/23 2:30 AM, Jeff Layton wrote:quoted
On Mon, 2023-01-09 at 22:48 -0800, Dai Ngo wrote:quoted
Currently nfsd4_state_shrinker_worker can be schduled multiple times from nfsd4_state_shrinker_count when memory is low. This causes the WARN_ON_ONCE in __queue_delayed_work to trigger. This patch allows only one instance of nfsd4_state_shrinker_worker at a time using the nfsd_shrinker_active flag, protected by the client_lock. Replace mod_delayed_work with queue_delayed_work since we don't expect to modify the delay of any pending work. Fixes: 44df6f439a17 ("NFSD: add delegation reaper to react to low memory condition") Reported-by: Mike Galbraith <redacted> Signed-off-by: Dai Ngo <dai.ngo@oracle.com> --- fs/nfsd/netns.h | 1 + fs/nfsd/nfs4state.c | 16 ++++++++++++++-- 2 files changed, 15 insertions(+), 2 deletions(-)diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h index 8c854ba3285b..801d70926442 100644 --- a/fs/nfsd/netns.h +++ b/fs/nfsd/netns.h@@ -196,6 +196,7 @@ struct nfsd_net { atomic_t nfsd_courtesy_clients; struct shrinker nfsd_client_shrinker; struct delayed_work nfsd_shrinker_work; + bool nfsd_shrinker_active; }; /* Simple check to find out if a given net was properly initialized */diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index ee56c9466304..e00551af6a11 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c@@ -4407,11 +4407,20 @@ nfsd4_state_shrinker_count(struct shrinker *shrink, struct shrink_control *sc) struct nfsd_net *nn = container_of(shrink, struct nfsd_net, nfsd_client_shrinker); + spin_lock(&nn->client_lock); + if (nn->nfsd_shrinker_active) { + spin_unlock(&nn->client_lock); + return 0; + }Is this extra machinery really necessary? The bool and spinlock don't seem to be needed. Typically there is no issue with calling queued_delayed_work when the work is already queued. It just returns false in that case without doing anything.When there are multiple calls to mod_delayed_work/queue_delayed_work we hit the WARN_ON_ONCE's in __queue_delayed_work and __queue_work if the work is queued but not execute yet.The delay argument of zero is interesting. If it's set to a value greater than zero, do you still see a problem?I tried and tried but could not reproduce the problem that Mike reported. I guess my VMs don't have fast enough cpus to make it happen.I'd prefer not to guess... it sounds like we don't have a clear root cause on this one yet.
Yes, we do, as I explained in above. The reason I could not reproduce it because my system is not fast enough to get multiple calls to nfsd4_state_shrinker_count simultaneously.
I think I agree with Jeff: a spinlock shouldn't be required to make queuing work safe via this API.quoted
As Jeff mentioned, delay 0 should be safe and we want to run the shrinker as soon as possible when memory is low.I suggested that because the !delay code paths seem to lead directly to the WARN_ONs in queue_work(). <shrug>
If we use 0 delay then we need the spinlock, as proven by Mike's test. If we use delay != 0 then it may work without the spinlock, we will Mike to retest it. You and Jeff decide what the delay is and I will make the change and retest. -Dai
quoted
-Daiquoted
quoted
This problem was reported by Mike. I initially tried with only the bool but that was not enough that was why the spinlock was added. Mike verified that the patch fixed the problem. -Daiquoted
quoted
count = atomic_read(&nn->nfsd_courtesy_clients); if (!count) count = atomic_long_read(&num_delegations); - if (count) - mod_delayed_work(laundry_wq, &nn->nfsd_shrinker_work, 0); + if (count) { + nn->nfsd_shrinker_active = true; + spin_unlock(&nn->client_lock); + queue_delayed_work(laundry_wq, &nn->nfsd_shrinker_work, 0); + } else + spin_unlock(&nn->client_lock); return (unsigned long)count; } @@ -6239,6 +6248,9 @@ nfsd4_state_shrinker_worker(struct work_struct *work) courtesy_client_reaper(nn); deleg_reaper(nn); + spin_lock(&nn->client_lock); + nn->nfsd_shrinker_active = 0; + spin_unlock(&nn->client_lock); } static inline __be32 nfs4_check_fh(struct svc_fh *fhp, struct nfs4_stid *stp)-- Chuck Lever-- Chuck Lever