Thread (27 messages) 27 messages, 8 authors, 2024-09-10

Re: [PATCH 0/2 v2] remove PF_MEMALLOC_NORECLAIM

From: Michal Hocko <mhocko@suse.com>
Date: 2024-09-03 07:06:19
Also in: linux-bcachefs, linux-fsdevel, linux-mm, lkml

On Mon 02-09-24 18:32:33, Kent Overstreet wrote:
On Mon, Sep 02, 2024 at 02:52:52PM GMT, Andrew Morton wrote:
quoted
On Mon, 2 Sep 2024 05:53:59 -0400 Kent Overstreet [off-list ref] wrote:
quoted
On Mon, Sep 02, 2024 at 11:51:48AM GMT, Michal Hocko wrote:
quoted
The previous version has been posted in [1]. Based on the review feedback
I have sent v2 of patches in the same threat but it seems that the
review has mostly settled on these patches. There is still an open
discussion on whether having a NORECLAIM allocator semantic (compare to
atomic) is worthwhile or how to deal with broken GFP_NOFAIL users but
those are not really relevant to this particular patchset as it 1)
doesn't aim to implement either of the two and 2) it aims at spreading
PF_MEMALLOC_NORECLAIM use while it doesn't have a properly defined
semantic now that it is not widely used and much harder to fix.

I have collected Reviewed-bys and reposting here. These patches are
touching bcachefs, VFS and core MM so I am not sure which tree to merge
this through but I guess going through Andrew makes the most sense.

Changes since v1;
- compile fixes
- rather than dropping PF_MEMALLOC_NORECLAIM alone reverted eab0af905bfc
  ("mm: introduce PF_MEMALLOC_NORECLAIM, PF_MEMALLOC_NOWARN") suggested
  by Matthew.
To reiterate:
It would be helpful to summarize your concerns.

What runtime impact do you expect this change will have upon bcachefs?
For bcachefs: I try really hard to minimize tail latency and make
performance robust in extreme scenarios - thrashing. A large part of
that is that btree locks must be held for no longer than necessary.

We definitely don't want to recurse into other parts of the kernel,
taking other locks (i.e. in memory reclaim) while holding btree locks;
that's a great way to stack up (and potentially multiply) latencies.
OK, these two patches do not fail to do that. The only existing user is
turned into GFP_NOWAIT so the final code works the same way. Right?
But gfp flags don't work with vmalloc allocations (and that's unlikely
to change), and we require vmalloc fallbacks for e.g. btree node
allocation. That's the big reason we want MEMALLOC_PF_NORECLAIM.
Have you even tried to reach out to vmalloc maintainers and asked for
GFP_NOWAIT support for vmalloc? Because I do not remember that. Sure
kernel page tables are have hardcoded GFP_KERNEL context which slightly
complicates that but that doesn't really mean the only potential
solution is to use a per task flag to override that. Just from top of my
head we can consider pre-allocating virtual address space for
non-sleeping allocations. Maybe there are other options that only people
deeply familiar with the vmalloc internals can see.

This requires discussions not pushing a very particular solution
through.
-- 
Michal Hocko
SUSE Labs
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help