Thread (33 messages) 33 messages, 6 authors, 2021-09-16

Re: [PATCH 5/6] XFS: remove congestion_wait() loop from kmem_alloc()

From: NeilBrown <hidden>
Date: 2021-09-14 03:28:05
Also in: linux-fsdevel, linux-mm, linux-nfs, linux-xfs, lkml

On Tue, 14 Sep 2021, Dave Chinner wrote:
On Tue, Sep 14, 2021 at 10:13:04AM +1000, NeilBrown wrote:
quoted
Documentation commment in gfp.h discourages indefinite retry loops on
ENOMEM and says of __GFP_NOFAIL that it

    is definitely preferable to use the flag rather than opencode
    endless loop around allocator.

So remove the loop, instead specifying __GFP_NOFAIL if KM_MAYFAIL was
not given.

Signed-off-by: NeilBrown <redacted>
---
 fs/xfs/kmem.c |   16 ++++------------
 1 file changed, 4 insertions(+), 12 deletions(-)
diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c
index 6f49bf39183c..f545f3633f88 100644
--- a/fs/xfs/kmem.c
+++ b/fs/xfs/kmem.c
@@ -13,19 +13,11 @@ kmem_alloc(size_t size, xfs_km_flags_t flags)
 {
 	int	retries = 0;
 	gfp_t	lflags = kmem_flags_convert(flags);
-	void	*ptr;
 
 	trace_kmem_alloc(size, flags, _RET_IP_);
 
-	do {
-		ptr = kmalloc(size, lflags);
-		if (ptr || (flags & KM_MAYFAIL))
-			return ptr;
-		if (!(++retries % 100))
-			xfs_err(NULL,
-	"%s(%u) possible memory allocation deadlock size %u in %s (mode:0x%x)",
-				current->comm, current->pid,
-				(unsigned int)size, __func__, lflags);
-		congestion_wait(BLK_RW_ASYNC, HZ/50);
-	} while (1);
+	if (!(flags & KM_MAYFAIL))
+		lflags |= __GFP_NOFAIL;
+
+	return kmalloc(size, lflags);
 }
Which means we no longer get warnings about memory allocation
failing - kmem_flags_convert() sets __GFP_NOWARN for all allocations
in this loop. Hence we'll now get silent deadlocks through this code
instead of getting warnings that memory allocation is failing
repeatedly.
Yes, that is a problem.  Could we just clear __GFP_NOWARN when setting
__GFP_NOFAIL?
Or is the 1-in-100 important? I think default warning is 1 every 10
seconds.
I also wonder about changing the backoff behaviour here (it's a 20ms
wait right now because there are not early wakeups) will affect the
behaviour, as __GFP_NOFAIL won't wait for that extra time between
allocation attempts....
The internal backoff is 100ms if there is much pending writeout, and
there are 16 internal retries.  If there is not much pending writeout, I
think it just loops with cond_resched().
So adding 20ms can only be at all interesting when the only way to
reclaim memory is something other than writeout.  I don't know how to
think about that.
And, of course, how did you test this? Sometimes we see
unpredicted behaviours as a result of "simple" changes like this
under low memory conditions...
I suspect this is close to untestable.  While I accept that there might
be a scenario where the change might cause some macro effect, it would
most likely be some interplay with some other subsystem struggling with
memory.  Testing XFS by itself would be unlikely to find it.

Thanks,
NeilBrown

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help