Re: [PATCH v9 3/3] mm/madvise: introduce process_madvise() syscall: an external memory hinting API
From: Michal Hocko <mhocko@suse.com>
Date: 2020-09-21 07:14:15
Also in:
linux-man, linux-mm, lkml
On Mon 21-09-20 07:56:33, Christoph Hellwig wrote:
On Mon, Aug 31, 2020 at 05:06:33PM -0700, Minchan Kim wrote:quoted
There is usecase that System Management Software(SMS) want to give a memory hint like MADV_[COLD|PAGEEOUT] to other processes and in the case of Android, it is the ActivityManagerService. The information required to make the reclaim decision is not known to the app. Instead, it is known to the centralized userspace daemon(ActivityManagerService), and that daemon must be able to initiate reclaim on its own without any app involvement. To solve the issue, this patch introduces a new syscall process_madvise(2). It uses pidfd of an external process to give the hint. It also supports vector address range because Android app has thousands of vmas due to zygote so it's totally waste of CPU and power if we should call the syscall one by one for each vma.(With testing 2000-vma syscall vs 1-vector syscall, it showed 15% performance improvement. I think it would be bigger in real practice because the testing ran very cache friendly environment).I'm really not sure this syscall is a good idea. If you want central control you should implement an IPC mechanisms that allows your supervisor daemon to tell the application to perform the madvice instead of forcing the behavior on it.
Even though I am not entirely happy about the interface [1]. As it seems I am in minority in my concern I backed off and decided to not block this work because I do not see the problem with the functionality itself. And I find it very useful for userspace driven memory management people are asking for a long time. This functionality shouldn't be much different from the standard memory reclaim. It has some limitations (e.g. it can only handle mapped memory) but allows to pro-actively swap out or reclaim disk based memory based on a specific knowlege of the workload. Kernel is not able to do the same. [1] http://lkml.kernel.org/r/20200117115225.GV19428@dhcp22.suse.cz -- Michal Hocko SUSE Labs