Thread (13 messages) 13 messages, 3 authors, 2024-09-26

Re: [PATCH v2 1/2] mm/madvise: introduce PR_MADV_SELF flag to process_madvise()

From: Shakeel Butt <shakeel.butt@linux.dev>
Date: 2024-09-25 14:03:09
Also in: linux-alpha, linux-mips, linux-mm, lkml

Cced Christian

On Tue, Sep 24, 2024 at 02:12:49PM GMT, Lorenzo Stoakes wrote:
On Tue, Sep 24, 2024 at 01:51:11PM GMT, Pedro Falcato wrote:
quoted
On Tue, Sep 24, 2024 at 12:16:27PM GMT, Lorenzo Stoakes wrote:
quoted
process_madvise() was conceived as a useful means for performing a vector
of madvise() operations on a remote process's address space.

However it's useful to be able to do so on the current process also. It is
currently rather clunky to do this (requiring a pidfd to be opened for the
current process) and introduces unnecessary overhead in incrementing
reference counts for the task and mm.

Avoid all of this by providing a PR_MADV_SELF flag, which causes
process_madvise() to simply ignore the pidfd parameter and instead apply
the operation to the current process.
How about simply defining a pseudo-fd PIDFD_SELF in the negative int space?
There's precedent for it in the fs space (AT_FDCWD). I think it's more ergonomic
and if you take out the errno space we have around 2^31 - 4096 available sentinel
values.

e.g:

/* AT_FDCWD = -10, -1 is dangerous, pick a different value */
#define PIDFD_SELF   -11

int pidfd = target_pid == getpid() ? PIDFD_SELF : pidfd_open(...);
process_madvise(pidfd, ...);


What do you think?
I like the way you're thinking, but I don't think this is something we can
do in the context of this series.

I mean, I totally accept using a flag here and ignoring the pidfd field is
_ugly_, no question. But I'm trying to find the smallest change that
achieves what we want.
I don't think "smallest change" should be the target. We are changing
user API and we should aim to make it as robust as possible against
possible misuse or making uninteded assumptions.

The proposed implementation opened the door for the applications to
provide dummy pidfd if PR_MADV_SELF is used. You definitely need to
restrict it to some known value like -1 used by mmap() syscall.
To add such a sentinel would be a change to the pidfd mechanism as a whole,
and we'd be left in the awkward situation that no other user of the pidfd
mechanism would be implementing this, but we'd have to expose this as a
general sentinel value for all pidfd users.
There might be future users which can take advantage of this. I can even
imagine pidfd_send_signal() can use PIDFD_SELF as well.
One nice thing with doing this as a flag is that, later, if somebody is
willing to do the larger task of having a special sentinel pidfd value to
mean 'the current process', we could use this in process_madvise() and
deprecate this flag :)
Once something is added to an API, particularly syscalls, the removal
is almost impossible.

Anyways, I don't have very strong opinion one way or other but whatever
we decide, let's make it robust.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help