Re: [PATCH RFC v2] Add /proc/pid/smaps_rollup
From: Michal Hocko <mhocko@kernel.org>
Date: 2017-08-10 10:58:52
Also in:
linux-fsdevel, linux-mm, lkml
On Thu 10-08-17 03:23:23, Daniel Colascione wrote:
Thanks for taking a look at the patch! On Thu, Aug 10 2017, Michal Hocko wrote:quoted
[CC linux-api - the patch was posted here http://lkml.kernel.org/r/20170810001557.147285-1-dancol@google.com] On Thu 10-08-17 13:38:31, Minchan Kim wrote:quoted
On Wed, Aug 09, 2017 at 05:15:57PM -0700, Daniel Colascione wrote:quoted
/proc/pid/smaps_rollup is a new proc file that improves the performance of user programs that determine aggregate memory statistics (e.g., total PSS) of a process. Android regularly "samples" the memory usage of various processes in order to balance its memory pool sizes. This sampling process involves opening /proc/pid/smaps and summing certain fields. For very large processes, sampling memory use this way can take several hundred milliseconds, due mostly to the overhead of the seq_printf calls in task_mmu.c.Have you tried to reduce that overhead? E.g. by replacing seq_printf by something more simple http://lkml.kernel.org/r/20160817130320.GC20703@dhcp22.suse.cz?I haven't tried that yet, but if I'm reading that thread correctly, it looks like using more efficient printing primitives gives us a 7% speedup. The smaps_rollup patch gives us a much bigger speedup while reusing almost all the smaps code, so it seems easier and simpler than a bunch of incremental improvements to smaps. And even an efficient smaps would have to push 2MB through seq_file for the 3000-VMA process case.
The thing is that more users would benefit from a more efficient /proc/pid/smaps call. Maybe we can use some caching tricks etc... We should make sure that existing options should be attempted before a new user visible interface is added. It is kind of sad that the real work (pte walk) is less expensive than formating the output and copying it to the userspace...
quoted
How often you you need to read this information?It varies depending on how often processes change state. We sample a short time (tens of seconds) after processes change state (e.g., enters foreground) and every few minutes thereafter. We're particularly concerned from an energy perspective about needlessly burning CPU on background samples.
Please make sure this is documented in the patch along with some numbers ideally. [...]
quoted
quoted
FYI, there was trial but got failed at that time so in this time, https://marc.info/?l=linux-kernel&m=147310650003277&w=2 http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1229163.htmlYes I really disliked the previous attempt and this one is not all that better. The primary unanswered question back then was a relevant usecase. Back then it was argued [1] that PSS was useful for userspace OOM handling but arguments were rather dubious. Follow up questions [2] shown that the useage of PSS was very workload specific. Minchan has noted some usecase as well but not very specific either.Anyway, I see what you mean about PSS being iffy for user-space OOM processing (because PSS doesn't tell you how much memory you get back in exchange for killing a given process at a particular moment). We're not using it like that. Instead, we're using the PSS samples we collect asynchronously for system-management tasks like fine-tuning oom_adj_score, memory use tracking for debugging, application-level memory-use attribution, and deciding whether we want to kill large processes during system idle maintenance windows. Android has been using PSS for these purposes for a long time; as the average process VMA count has increased and and devices become more efficiency-conscious, PSS-collection inefficiency has started to matter more. IMHO, it'd be a lot safer to optimize the existing PSS-collection model, which has been fine-tuned over the years, instead of changing the memory tracking approach entirely to work around smaps-generation inefficiency.
This is really vague. Please be more specific. -- Michal Hocko SUSE Labs