Re: [PATCH 0/2 v2] remove PF_MEMALLOC_NORECLAIM

From: Kent Overstreet <kent.overstreet@linux.dev>
Date: 2024-09-04 16:06:03
Also in: linux-bcachefs, linux-fsdevel, linux-mm, lkml

On Wed, Sep 04, 2024 at 09:14:29AM GMT, Michal Hocko wrote:

On Tue 03-09-24 19:53:41, Kent Overstreet wrote:
[...]

quoted

However, if we agreed that GFP_NOFAIL meant "only fail if it is not
possible to satisfy this allocation" (and I have been arguing that that
is the only sane meaning) - then that could lead to a lot of error paths
getting simpler.

Because there are a lot of places where there's essentially no good
reason to bubble up an -ENOMEM to userspace; if we're actually out of
memory the current allocation is just one out of many and not
particularly special, better to let the oom killer handle it...

This is exactly GFP_KERNEL semantic for low order allocations or
kvmalloc for that matter. They simply never fail unless couple of corner
cases - e.g. the allocating task is an oom victim and all of the oom
memory reserves have been consumed. This is where we call "not possible
to allocate".

*nod*

Which does beg the question of why GFP_NOFAIL exists.

quoted

So the error paths would be more along the lines of "there's a bug, or
userspace has requested something crazy, just shut down gracefully".

How do you expect that to be done? Who is going to go over all those
GFP_NOFAIL users? And what kind of guide lines should they follow? It is
clear that they believe they cannot handle the failure gracefully
therefore they have requested GFP_NOFAIL. Many of them do not have
return value to return.

They can't handle the allocatian failure and continue normal operation,
but that's entirely different from not being able to handle the
allocation failure at all - it's not hard to do an emergency shutdown,
that's a normal thing for filesystems to do.

And if you scan for GFP_NOFAIL uses in the kernel, a decent number
already do just that.

So really what do you expect proper GFP_NOFAIL users to do and what
should happen to those that are requesting unsupported size or
allocation mode?

Emergency shutdwon.

And I'm not saying we have to rush to fix all the existing callers;
they're clearly in existing well tested code, there's not much point to
that.

Additionally most of them are deep in the guts of filesystem transaction
code where call paths to that site are limited - they're not library
code that gets called by anything.

But as a matter of policy going forward, yes we should be saying that
even GFP_NOFAIL allocations should be checking for -ENOMEM.

Yes, we need to define some reasonable maximum supported sizes. For the
page allocator this has been order > 1 and we considering we have a
warning about those requests for years without a single report then we
can assume we do not have such abusers. for kvmalloc to story is
different. Current INT_MAX is just not any practical limit. Past
experience says that anything based on the amount of memory just doesn't
work (e.g. hash table sizes that used to that scaling and there are
other examples). So we should be practical here and look at existing
users and see what they really need and put a cap above that.

Not following what you're saying about hash tables? Hash tables scale
roughly with the amount of system memory/workingset.

But it seems to me that the limit should be lower if you're on e.g. a 2
GB machine (not failing with a warning, just failing immediately rather
than oom killing a bunch of stuff first) - and it's going to need to be
raised above INT_MAX as large memory machines keep growing, I keep
hitting it in bcachefs fsck code.

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help