Thread (13 messages) 13 messages, 3 authors, 2020-02-11

Re: [PATCH v3 2/5] mm: introduce external memory hinting API

From: Minchan Kim <hidden>
Date: 2020-02-10 21:27:36
Also in: linux-mm, lkml

Hi Suren,

On Mon, Feb 10, 2020 at 09:50:20AM -0800, Suren Baghdasaryan wrote:
On Mon, Jan 27, 2020 at 4:17 PM Minchan Kim [off-list ref] wrote:
quoted
There is usecase that System Management Software(SMS) want to give
a memory hint like MADV_[COLD|PAGEEOUT] to other processes and
in the case of Android, it is the ActivityManagerService.

It's similar in spirit to madvise(MADV_WONTNEED), but the information
required to make the reclaim decision is not known to the app. Instead,
it is known to the centralized userspace daemon(ActivityManagerService),
and that daemon must be able to initiate reclaim on its own without
any app involvement.

To solve the issue, this patch introduces a new syscall process_madvise(2).
It uses pidfd of an external process to give the hint.

 int process_madvise(int pidfd, void *addr, size_t length, int advise,
                        unsigned long flag);

Since it could affect other process's address range, only privileged
process(CAP_SYS_PTRACE) or something else(e.g., being the same UID)
gives it the right to ptrace the process could use it successfully.
The flag argument is reserved for future use if we need to extend the
API.

I think supporting all hints madvise has/will supported/support to
process_madvise is rather risky. Because we are not sure all hints make
sense from external process and implementation for the hint may rely on
the caller being in the current context so it could be error-prone.
Thus, I just limited hints as MADV_[COLD|PAGEOUT] in this patch.

If someone want to add other hints, we could hear hear the usecase and
review it for each hint. It's safer for maintenance rather than
introducing a buggy syscall but hard to fix it later.
I would definitely be interested in adding MADV_DONTNEED support for
process_madvise() to allow quick memory reclaim after a kill. The
scenario is that userspace daemon can kill a process and try to help
reclaim its memory. Having process_madvise(MADV_DONTNEED) support
helps in the following cases:
1. Process issuing process_madvise has a higher CPU bandwidth
allowance than the victim process, therefore can reclaim victim's
memory quicker.
2. In case the victim occupies large amounts of memory the process
issuing process_madvise can spawn multiple (possibly high priority)
threads each reclaiming portions of the victim's memory.
Such an extension will add a destructive kind of madvise into the set
supported by process_madvise and I want to make sure we can accomodate
for that in the future. Do you see any issues with supporting
MADV_DONTNEED in the future?
Or kernel could do by themselves to spawn mulitple tasks if the system has
available badwidth and target process has a lot memory to be reclaimed

Anyway, it doesn't have any issue because we already have some synchrnoization
methods(e.g., signal or cgroup freezer) to freeze target processes before
giving a hint. It's not different with usual write syscall on shared file
among processes.

< snip >
quoted
diff --git a/mm/madvise.c b/mm/madvise.c
index 0c901de531e4..00ffa7e92f79 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -17,6 +17,7 @@
 #include <linux/falloc.h>
 #include <linux/fadvise.h>
 #include <linux/sched.h>
+#include <linux/sched/mm.h>
 #include <linux/ksm.h>
 #include <linux/fs.h>
 #include <linux/file.h>
@@ -315,6 +316,8 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd,

        if (fatal_signal_pending(task))
                return -EINTR;
+       else if (current != task && fatal_signal_pending(current))
+               return -EINTR;
I think this can be simplified as:

+       if (fatal_signal_pending(current))
+               return -EINTR;

current != task condition is not needed because if current == task
then you would return earlier after checking
fatal_signal_pending(task).
True, I will remove it.

Thanks!
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help