Thread (26 messages) 26 messages, 4 authors, 2023-11-28

Re: [RFC PATCH 00/11] mm/mempolicy: Make task->mempolicy externally modifiable via syscall and procfs

From: Michal Hocko <mhocko@suse.com>
Date: 2023-11-27 15:29:59
Also in: linux-arch, linux-doc, linux-fsdevel, linux-mm, lkml

Sorry, didn't have much time to do a proper review. Couple of points
here at least.

On Wed 22-11-23 17:24:10, Gregory Price wrote:
On Wed, Nov 22, 2023 at 01:33:48PM -0800, Andrew Morton wrote:
quoted
On Wed, 22 Nov 2023 16:11:49 -0500 Gregory Price [off-list ref] wrote:
quoted
The patch set changes task->mempolicy to be modifiable by tasks other
than just current.

The ultimate goal is to make mempolicy more flexible and extensible,
such as adding interleave weights (which may need to change at runtime
due to hotplug events).  Making mempolicy externally modifiable allows
for userland daemons to make runtime performance adjustments to running
tasks without that software needing to be made numa-aware.
Please add to this [0/N] a full description of the security aspect: who
can modify whose mempolicy, along with a full description of the
reasoning behind this decision.
Will do. For the sake of v0 for now:

1) the task itself (task == current)
   for obvious reasons: it already can

2) from external interfaces: CAP_SYS_NICE
Makes sense.

[...]
quoted
quoted
3. Add external interfaces which allow for a task mempolicy to be
   modified by another task.  This is implemented in 4 syscalls
   and a procfs interface:
        sys_set_task_mempolicy
        sys_get_task_mempolicy
        sys_set_task_mempolicy_home_node
        sys_task_mbind
        /proc/[pid]/mempolicy
Why is the procfs interface needed?  Doesn't it simply duplicate the
syscall interface?  Please update [0/N] with a description of this
decision.
Honestly I wrote the procfs interface first, and then came back around
to just implement the syscalls.  mbind is not friendly to being procfs'd
so if the preference is to have only one, not both, then it should
probably be the syscalls.

That said, when I introduce weighted interleave on top of this, having a
simple procfs interface to those weights would be valuable, so I
imagined something like `proc/mempolicy` to determine if interleave was
being used and something like `proc/mpol_interleave_weights` for a clean
interface to update weights.

However, in the same breath, I have a prior RFC with set/get_mempolicy2
which could probably take all future mempolicy extensions and wrap them
up into one pair of syscalls, instead of us ending up with 200 more
sys_mempolicy_whatever as memory attached fabrics become more common.

So... yeah... the is one area I think the community very much needs to
comment:  set/get_mempolicy2, many new mempolicy syscalls, procfs? All
of the above?
I think we should actively avoid using proc interface. The most
reasonable way would be to add get_mempolicy2 interface that would allow
extensions and then create a pidfd counterpart to allow acting on a
remote task. The latter would require some changes to make mempolicy
code less current oriented.
-- 
Michal Hocko
SUSE Labs
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help