Re: [PATCH] Add prctl support for controlling PF_MEMALLOC V2

From: Mike Christie <hidden>
Date: 2019-10-23 17:27:39
Also in: linux-fsdevel, linux-mm, linux-scsi, lkml

On 10/23/2019 02:11 AM, Michal Hocko wrote:

On Wed 23-10-19 07:43:44, Dave Chinner wrote:

quoted

On Tue, Oct 22, 2019 at 06:33:10PM +0200, Michal Hocko wrote:

Thanks for more clarifiation regarding PF_LESS_THROTTLE.

[...]

quoted

PF_IO_FLUSHER would mean that the user
context is a part of the IO path and therefore there are certain reclaim
recursion restrictions.

If PF_IO_FLUSHER just maps to PF_LESS_THROTTLE|PF_MEMALLOC_NOIO,
then I'm not sure we need a new definition. Maybe that's the ptrace
flag name, but in the kernel we don't need a PF_IO_FLUSHER process
flag...

Yes, the internal implementation would do something like that. I was
more interested in the user space visible API at this stage. Something
generic enough because exporting MEMALLOC flags is just a bad idea IMHO
(especially PF_MEMALLOC).

Do you mean we would do something like:

prctl()
....
case PF_SET_IO_FLUSHER:
        current->flags |= PF_MEMALLOC_NOIO;
....

or are you saying we would add a new PF_IO_FLUSHER flag and then modify
PF_MEMALLOC_NOIO uses like in current_gfp_context:

if (current->flags & (PF_MEMALLOC_NOIO | PF_IO_FLUSHER)
      flags &= ~(__GFP_IO | __GFP_FS);

?

quoted

This patch allows the userspace deamon to set the PF_MEMALLOC* flags
with prctl during their initialization so later allocations cannot
calling back into them.

TBH I am not really happy to export these to the userspace. They are
an internal implementation detail and the userspace shouldn't really

They care in these cases, because block/fs drivers must be able to make
forward progress during writes. To meet this guarantee kernel block
drivers use mempools and memalloc/GFP flags.

For these userspace components of the block/fs drivers they already do
things normal daemons do not to meet that guarantee like mlock their
memory, disable oom killer, and preallocate resources they have control
over. They have no control over reclaim like the kernel drivers do so
its easy for us to deadlock when memory gets low.

OK, fair enough. How much of a control do they really need though. Is a
single PF_IO_FLUSHER as explained above (essentially imply GPF_NOIO
context) sufficient?

I think some of these usrspace processes work at the filesystem
level and so really only need GFP_NOFS allocation (fuse), while
others work at the block device level (iscsi, nbd) so need GFP_NOIO
allocation. So there's definitely an argument for providing both...

The main question is whether giving more APIs is really necessary. Is
there any real problem to give them only PF_IO_FLUSHER and let both
groups use this one? It will imply more reclaim restrictions for solely
FS based ones but is this a practical problem? If yes we can always add
PF_FS_$FOO later on.


I am not sure. I will have to defer to general FS experts like Dave or
Martin and Damien for the specific fuse case. There do not seem to be a
lot of places where we check for __GFP_IO so configs with fuse and
bcache for example are probably not a big deal. However, I am not very
familiar with some of the other code paths in the mm layer and how FSs
interact with them.

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help