Thread (43 messages) 43 messages, 8 authors, 2020-06-11

Re: [PATCH v7 5/7] mm: support both pid and pidfd for process_madvise

From: Suren Baghdasaryan <surenb@google.com>
Date: 2020-05-12 19:55:47
Also in: linux-mm, lkml

On Sat, May 9, 2020 at 4:14 PM Minchan Kim [off-list ref] wrote:
Hi Christian,

On Sat, May 09, 2020 at 02:48:17PM +0200, Christian Brauner wrote:
quoted
On Fri, May 08, 2020 at 04:04:15PM -0700, Andrew Morton wrote:
quoted
On Fri, 8 May 2020 11:36:53 -0700 Minchan Kim [off-list ref] wrote:
quoted
...

Per Vlastimil's request, I changed "which and advise" with "idtype and
advice" in function prototype of description.
Could you replace the part in the description? Code is never changed.
Done, but...
quoted
...

There is a demand[1] to support pid as well pidfd for process_madvise to
reduce unnecessary syscall to get pidfd if the user has control of the
target process(ie, they could guarantee the process is not gone or pid is
not reused).

This patch aims for supporting both options like waitid(2).  So, the
syscall is currently,

        int process_madvise(idtype_t idtype, id_t id, void *addr,
                size_t length, int advice, unsigned long flags);

@which is actually idtype_t for userspace libray and currently, it
supports P_PID and P_PIDFD.
What does "@which is actually idtype_t for userspace libray" mean?  Can
you clarify and expand?
If I may clarify, the only case where we've supported both pidfd and pid
in the same system call is waitid() to avoid adding a dedicated system
call for waiting and because waitid() already had this (imho insane)
argument type switching. The idtype_t thing comes from waitid() and is
located int sys/wait.h and is defined as

"The type idtype_t is defined as an enumeration type whose possible
values include at least the following:

P_ALL
P_PID
P_PGID
"

int waitid(idtype_t idtype, id_t id, siginfo_t *infop, int options);
If idtype is P_PID, waitid() shall wait for the child with a process ID equal to (pid_t)id.
If idtype is P_PGID, waitid() shall wait for any child with a process group ID equal to (pid_t)id.
If idtype is P_ALL, waitid() shall wait for any children and id is ignored.

I'm personally not a fan of this idtype_t thing and think this should
just have been
quoted
quoted
        int pidfd_madvise(int pidfd, void *addr,
                size_t length, int advice, unsigned long flags);
and call it a day.
That was the argument at that time, Daniel and I didn't want to have
pid along with pidfd even though Kirill strongly wanted to have it.
However you said " Overall, I don't particularly care how or if you
integrate pidfd here." at that time.

https://lore.kernel.org/linux-mm/20200113104256.5ujbplyec2sk4onn@wittgenstein/ (local)

I asked a question to Kirll at that time.

"
quoted
Sounds like that you want to support both options for every upcoming API
which deals with pid. I'm not sure how it's critical for process_madvise
API this case. In general, we sacrifice some performance for the nicer one
and later, once it's reported as hurdle for some workload, we could fix it
via introducing new flag. What I don't like at this moment is to make
syscall complicated with potential scenarios without real workload.
Yes, I suggest allowing both options for every new process api
"
https://lore.kernel.org/linux-mm/9d849087-3359-c4ab-fbec-859e8186c509@virtuozzo.com/ (local)

You didn't give the opinion at that time, either(I expected you will
make some voice then). What I could do to proceed work was separate it
as different patch like this one to get more attention in future.
And now it works.

Let me clarify my side: I still don't like to introduce pid for new API
since we have pidfd. Since you just brought this issue again, I want to
hear *opinions* from others, again.

IIRC Kirill's main complaint was that if we support only pidfds and
userspace has a pid of the process then it would have to convert that
pid into pidfd before calling process_madvise, which involves
additional syscall(s). The overhead would be more tangible if there
are multiple processes needing to be madvised.
I'm not sure how often such a need arises to madvise multiple
processes in a bulk like that and how critical is the overhead of
obtaining pidfd. With pid reuse possibility pid-based API will still
have the issue of possibly sending the request to a wrong process, so
this pidfd obtaining overhead arguably makes the usage more robust and
therefore is not a pure loss.

I don't have a real strong opinion against supporting pid in this
syscall but I think API maintainers should decide going forward
whether new APIs should support pid along with pidfd or switch to
pidfd only.
Thanks!
quoted
Also, if I may ask, why is the flag argument "unsigned long"?
That's pretty unorthodox. The expectation is that flag arguments are
not word-size dependent and should usually use "unsigned int". All new
system calls follow this pattern too.
Nothing special in this flag: Let me change it as "unsigned int".
I will send the change once we have an agreement on "pidfd" argument.

Thanks for the review, Christian!
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help