Re: [PATCH] Add prctl support for controlling PF_MEMALLOC V2
From: Mike Christie <hidden>
Date: 2019-10-23 17:27:39
Also in:
linux-fsdevel, linux-mm, linux-scsi, lkml
On 10/23/2019 02:11 AM, Michal Hocko wrote:
On Wed 23-10-19 07:43:44, Dave Chinner wrote:quoted
On Tue, Oct 22, 2019 at 06:33:10PM +0200, Michal Hocko wrote:Thanks for more clarifiation regarding PF_LESS_THROTTLE. [...]quoted
quoted
PF_IO_FLUSHER would mean that the user context is a part of the IO path and therefore there are certain reclaim recursion restrictions.If PF_IO_FLUSHER just maps to PF_LESS_THROTTLE|PF_MEMALLOC_NOIO, then I'm not sure we need a new definition. Maybe that's the ptrace flag name, but in the kernel we don't need a PF_IO_FLUSHER process flag...Yes, the internal implementation would do something like that. I was more interested in the user space visible API at this stage. Something generic enough because exporting MEMALLOC flags is just a bad idea IMHO (especially PF_MEMALLOC).
Do you mean we would do something like:
prctl()
....
case PF_SET_IO_FLUSHER:
current->flags |= PF_MEMALLOC_NOIO;
....
or are you saying we would add a new PF_IO_FLUSHER flag and then modify
PF_MEMALLOC_NOIO uses like in current_gfp_context:
if (current->flags & (PF_MEMALLOC_NOIO | PF_IO_FLUSHER)
flags &= ~(__GFP_IO | __GFP_FS);
?
quoted
quoted
quoted
quoted
quoted
This patch allows the userspace deamon to set the PF_MEMALLOC* flags with prctl during their initialization so later allocations cannot calling back into them.TBH I am not really happy to export these to the userspace. They are an internal implementation detail and the userspace shouldn't reallyThey care in these cases, because block/fs drivers must be able to make forward progress during writes. To meet this guarantee kernel block drivers use mempools and memalloc/GFP flags. For these userspace components of the block/fs drivers they already do things normal daemons do not to meet that guarantee like mlock their memory, disable oom killer, and preallocate resources they have control over. They have no control over reclaim like the kernel drivers do so its easy for us to deadlock when memory gets low.OK, fair enough. How much of a control do they really need though. Is a single PF_IO_FLUSHER as explained above (essentially imply GPF_NOIO context) sufficient?I think some of these usrspace processes work at the filesystem level and so really only need GFP_NOFS allocation (fuse), while others work at the block device level (iscsi, nbd) so need GFP_NOIO allocation. So there's definitely an argument for providing both...The main question is whether giving more APIs is really necessary. Is there any real problem to give them only PF_IO_FLUSHER and let both groups use this one? It will imply more reclaim restrictions for solely FS based ones but is this a practical problem? If yes we can always add PF_FS_$FOO later on.
I am not sure. I will have to defer to general FS experts like Dave or Martin and Damien for the specific fuse case. There do not seem to be a lot of places where we check for __GFP_IO so configs with fuse and bcache for example are probably not a big deal. However, I am not very familiar with some of the other code paths in the mm layer and how FSs interact with them.