Re: [PATCH v3 2/5] mm: introduce external memory hinting API
From: Minchan Kim <hidden>
Date: 2020-02-10 21:27:36
Also in:
linux-mm, lkml
Hi Suren, On Mon, Feb 10, 2020 at 09:50:20AM -0800, Suren Baghdasaryan wrote:
On Mon, Jan 27, 2020 at 4:17 PM Minchan Kim [off-list ref] wrote:quoted
There is usecase that System Management Software(SMS) want to give a memory hint like MADV_[COLD|PAGEEOUT] to other processes and in the case of Android, it is the ActivityManagerService. It's similar in spirit to madvise(MADV_WONTNEED), but the information required to make the reclaim decision is not known to the app. Instead, it is known to the centralized userspace daemon(ActivityManagerService), and that daemon must be able to initiate reclaim on its own without any app involvement. To solve the issue, this patch introduces a new syscall process_madvise(2). It uses pidfd of an external process to give the hint. int process_madvise(int pidfd, void *addr, size_t length, int advise, unsigned long flag); Since it could affect other process's address range, only privileged process(CAP_SYS_PTRACE) or something else(e.g., being the same UID) gives it the right to ptrace the process could use it successfully. The flag argument is reserved for future use if we need to extend the API. I think supporting all hints madvise has/will supported/support to process_madvise is rather risky. Because we are not sure all hints make sense from external process and implementation for the hint may rely on the caller being in the current context so it could be error-prone. Thus, I just limited hints as MADV_[COLD|PAGEOUT] in this patch. If someone want to add other hints, we could hear hear the usecase and review it for each hint. It's safer for maintenance rather than introducing a buggy syscall but hard to fix it later.I would definitely be interested in adding MADV_DONTNEED support for process_madvise() to allow quick memory reclaim after a kill. The scenario is that userspace daemon can kill a process and try to help reclaim its memory. Having process_madvise(MADV_DONTNEED) support helps in the following cases: 1. Process issuing process_madvise has a higher CPU bandwidth allowance than the victim process, therefore can reclaim victim's memory quicker. 2. In case the victim occupies large amounts of memory the process issuing process_madvise can spawn multiple (possibly high priority) threads each reclaiming portions of the victim's memory. Such an extension will add a destructive kind of madvise into the set supported by process_madvise and I want to make sure we can accomodate for that in the future. Do you see any issues with supporting MADV_DONTNEED in the future?
Or kernel could do by themselves to spawn mulitple tasks if the system has available badwidth and target process has a lot memory to be reclaimed Anyway, it doesn't have any issue because we already have some synchrnoization methods(e.g., signal or cgroup freezer) to freeze target processes before giving a hint. It's not different with usual write syscall on shared file among processes. < snip >
quoted
diff --git a/mm/madvise.c b/mm/madvise.c index 0c901de531e4..00ffa7e92f79 100644 --- a/mm/madvise.c +++ b/mm/madvise.c@@ -17,6 +17,7 @@ #include <linux/falloc.h> #include <linux/fadvise.h> #include <linux/sched.h> +#include <linux/sched/mm.h> #include <linux/ksm.h> #include <linux/fs.h> #include <linux/file.h>@@ -315,6 +316,8 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd, if (fatal_signal_pending(task)) return -EINTR; + else if (current != task && fatal_signal_pending(current)) + return -EINTR;I think this can be simplified as: + if (fatal_signal_pending(current)) + return -EINTR; current != task condition is not needed because if current == task then you would return earlier after checking fatal_signal_pending(task).
True, I will remove it. Thanks!