Thread (29 messages) 29 messages, 6 authors, 2021-10-22

Re: [PATCH 2/6] MM: improve documentation for __GFP_NOFAIL

From: Michal Hocko <mhocko@suse.com>
Date: 2021-10-13 08:27:05
Also in: linux-doc, linux-ext4, linux-fsdevel, linux-mm, linux-xfs, lkml

On Wed 13-10-21 13:32:31, Dave Chinner wrote:
On Mon, Oct 11, 2021 at 01:57:36PM +0200, Michal Hocko wrote:
quoted
On Sat 09-10-21 09:36:49, Dave Chinner wrote:
quoted
On Fri, Oct 08, 2021 at 09:48:39AM +0200, Michal Hocko wrote:
quoted
quoted
quoted
quoted
Even the API constaints of kvmalloc() w.r.t. only doing the vmalloc
fallback if the gfp context is GFP_KERNEL - we already do GFP_NOFS
kvmalloc via memalloc_nofs_save/restore(), so this behavioural
restriction w.r.t. gfp flags just makes no sense at all.
GFP_NOFS (without using the scope API) has the same problem as NOFAIL in
the vmalloc. Hence it is not supported. If you use the scope API then
you can GFP_KERNEL for kvmalloc. This is clumsy but I am not sure how to
define these conditions in a more sensible way. Special case NOFS if the
scope api is in use? Why do you want an explicit NOFS then?
Exactly my point - this is clumsy and a total mess. I'm not asking
for an explicit GFP_NOFS, just pointing out that the documented
restrictions that "vmalloc can only do GFP_KERNEL allocations" is
completely wrong.

vmalloc()
{
	if (!(gfp_flags &  __GFP_FS))
		memalloc_nofs_save();
	p = __vmalloc(gfp_flags | GFP_KERNEL)
	if (!(gfp_flags &  __GFP_FS))
		memalloc_nofs_restore();
}

Yup, that's how simple it is to support GFP_NOFS support in
vmalloc().
Yes, this would work from the functionality POV but it defeats the
philosophy behind the scope API. Why would you even need this if the
scope was defined by the caller of the allocator?
Who actually cares that vmalloc might be using the scoped API
internally to implement GFP_NOFS or GFP_NOIO? Nobody at all.
It is far more useful (and self documenting!) for one-off allocations
to pass a GFP_NOFS flag than it is to use a scope API...
I would agree with you if the explicit GFP_NOFS usage was consistent
and actually justified in the majority cases. My experience tells me
otherwise though. Many filesystems use the flag just because that is
easier. That leads to a huge overuse of the flag that leads to practical
problems.

I was hoping that if we offer an API that would define problematic
reclaim recursion scopes then it would reduce the abuse. I haven't
expected this to happen overnight but it is few years and it seems
it will not happen soon either.

[...]
quoted
quoted
It also points out that the scope API is highly deficient.
We can do GFP_NOFS via the scope API, but we can't
do anything else because *there is no scope API for other GFP
flags*.

Why don't we have a GFP_NOFAIL/__GFP_RETRY_FOREVER scope API?
NO{FS,IO} where first flags to start this approach. And I have to admit
the experiment was much less successful then I hoped for. There are
still thousands of direct NOFS users so for some reason defining scopes
is not an easy thing to do.

I am not against NOFAIL scopes in principle but seeing the nofs
"success" I am worried this will not go really well either and it is
much more tricky as NOFAIL has much stronger requirements than NOFS.
Just imagine how tricky this can be if you just call a library code
that is not under your control within a NOFAIL scope. What if that
library code decides to allocate (e.g. printk that would attempt to do
an optimistic NOWAIT allocation).
I already asked you that _exact_ question earlier in the thread
w.r.t.  kvmalloc(GFP_NOFAIL) using optimistic NOWAIT kmalloc
allocation. I asked you as a MM expert to define *and document* the
behaviour that should result, not turn around and use the fact that
it is undefined behaviour as a "this is too hard" excuse for not
changing anything.
Dave, you have "thrown" a lot of complains in previous emails and it is
hard to tell rants from features requests apart. I am sorry but I
believe it would be much more productive to continue this discussion if
you could mild your tone.

Can I ask you to break down your feature requests into separate emails
so that we can discuss and track them separately rather in this quite a
long thread which has IMHO diverghed from the initial topic. Thanks!
THe fact is that the scope APIs are only really useful for certain
contexts where restrictions are set by higher level functionality.
For one-off allocation constraints the API sucks and we end up with
Could you be more specific about these one-off allocation constrains?
What would be the reason to define one-off NO{FS,IO} allocation
constrain? Or did you have your NOFAIL example in mind?
crap like this (found in btrfs):

                /*                                                               
                 * We're holding a transaction handle, so use a NOFS memory      
                 * allocation context to avoid deadlock if reclaim happens.      
                 */                                                              
                nofs_flag = memalloc_nofs_save();                                
                value = kmalloc(size, GFP_KERNEL);                               
                memalloc_nofs_restore(nofs_flag);                                
Yes this looks wrong indeed! If I were to review such a code I would ask
why the scope cannot match the transaction handle context. IIRC jbd does
that.

I am aware of these patterns. I was pulled in some discussions in the
past and in some it turned out that the constrain is not needed at all
and in some cases that has led to a proper scope definition. As you
point out in your other examples it just happens that it is easier to
go an easy path and define scopes ad-hoc to work around allocation
API limitations.

[...]
IOWs, a large number of the users of the scope API simply make
[k]vmalloc() provide GFP_NOFS behaviour. ceph_kvmalloc() is pretty
much a wrapper that indicates how all vmalloc functions should
behave. Honour GFP_NOFS and GFP_NOIO by using the scope API
internally.
I was discouraging from this behavior at vmalloc level to push people
to use scopes properly - aka at the level where the reclaim recursion is
really a problem. If that is infeasible in practice then we can
re-evaluate of course. I was really hoping we can get rid of cargo cult
GFP_NOFS usage this way but the reality often disagrees with hopes.

All that being said, let's discuss [k]vmalloc constrains and usecases
that need changes in a separate email thread.

Thanks!
-- 
Michal Hocko
SUSE Labs
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help