Thread (10 messages) 10 messages, 3 authors, 2019-10-23

Re: [PATCH] Add prctl support for controlling PF_MEMALLOC V2

From: Michal Hocko <mhocko@kernel.org>
Date: 2019-10-22 16:43:55
Also in: linux-fsdevel, linux-mm, linux-scsi, lkml

On Tue 22-10-19 11:13:20, Mike Christie wrote:
On 10/22/2019 06:24 AM, Michal Hocko wrote:
quoted
On Mon 21-10-19 16:41:37, Mike Christie wrote:
quoted
There are several storage drivers like dm-multipath, iscsi, tcmu-runner,
amd nbd that have userspace components that can run in the IO path. For
example, iscsi and nbd's userspace deamons may need to recreate a socket
and/or send IO on it, and dm-multipath's daemon multipathd may need to
send IO to figure out the state of paths and re-set them up.

In the kernel these drivers have access to GFP_NOIO/GFP_NOFS and the
memalloc_*_save/restore functions to control the allocation behavior,
but for userspace we would end up hitting a allocation that ended up
writing data back to the same device we are trying to allocate for.
Which code paths are we talking about here? Any ioctl or is this a
general syscall path? Can we mark the process in a more generic way?
It depends on the daemon. The common one for example are iscsi and nbd
need network related calls like sendmsg, recvmsg, socket, etc.
tcmu-runner could need the network ones and also read and write when it
does IO to a FS or device. dm-multipath needs the sg io ioctls.
OK, so there is not a clear kernel entry point that could be explicitly
annotated. This would imply a per task context. This is an important
information. And I am wondering how those usecases ever worked in the
first place. This is not a minor detail.
 
quoted
E.g. we have PF_LESS_THROTTLE (used by nfsd). It doesn't affect the
reclaim recursion but it shows a pattern that doesn't really exhibit
too many internals. Maybe we need PF_IO_FLUSHER or similar?
I am not familiar with PF_IO_FLUSHER. If it prevents the recursion
problem then please send me details and I will look into it for the next
posting.
PF_IO_FLUSHER doesn't exist. I just wanted to point out that similarly
to PF_LESS_THROTTLE it should be a more high level per task flag rather
than something as low level as a direct control of gfp allocation
context. PF_LESS_THROTTLE simply tells that the task is a part of the
reclaim process and therefore it shouldn't be a subject of a normal
throttling - whatever that means. PF_IO_FLUSHER would mean that the user
context is a part of the IO path and therefore there are certain reclaim
recursion restrictions.
 
quoted
quoted
This patch allows the userspace deamon to set the PF_MEMALLOC* flags
with prctl during their initialization so later allocations cannot
calling back into them.
TBH I am not really happy to export these to the userspace. They are
an internal implementation detail and the userspace shouldn't really
They care in these cases, because block/fs drivers must be able to make
forward progress during writes. To meet this guarantee kernel block
drivers use mempools and memalloc/GFP flags.

For these userspace components of the block/fs drivers they already do
things normal daemons do not to meet that guarantee like mlock their
memory, disable oom killer, and preallocate resources they have control
over. They have no control over reclaim like the kernel drivers do so
its easy for us to deadlock when memory gets low.
OK, fair enough. How much of a control do they really need though. Is a
single PF_IO_FLUSHER as explained above (essentially imply GPF_NOIO
context) sufficient?
-- 
Michal Hocko
SUSE Labs
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help