Re: [PATCH RFC v2] Add /proc/pid/smaps_rollup
From: Michal Hocko <mhocko@kernel.org>
Date: 2017-08-24 08:55:53
Also in:
linux-fsdevel, linux-mm, lkml
Sorry for a late reply On Thu 10-08-17 12:17:07, Tim Murray wrote:
I've looked into this a fair bit on the Android side, so I can provide some context. There are two main reasons why Android gathers PSS information: 1. Android devices can show the user the amount of memory used per application via the settings app. This is a less important use case.
yes
2. We log PSS to help identify leaks in applications. We have found an
enormous number of bugs (in the Android platform, in Google's own
apps, and in third-party applications) using this data.
To do this, system_server (the main process in Android userspace) will
sample the PSS of a process three seconds after it changes state (for
example, app is launched and becomes the foreground application) and
about every ten minutes after that. The net result is that PSS
collection is regularly running on at least one process in the system
(usually a few times a minute while the screen is on, less when screen
is off due to suspend). PSS of a process is an incredibly useful stat
to track, and we aren't going to get rid of it. We've looked at some
very hacky approaches using RSS ("take the RSS of the target process,
subtract the RSS of the zygote process that is the parent of all
Android apps") to reduce the accounting time, but it regularly
overestimated the memory used by 20+ percent. Accordingly, I don't
think that there's a good alternative to using PSS.Even if the RSS overestimates this shouldn't hide a memory leak, no?
We started looking into PSS collection performance after we noticed random frequency spikes while a phone's screen was off; occasionally, one of the CPU clusters would ramp to a high frequency because there was 200-300ms of constant CPU work from a single thread in the main Android userspace process. The work causing the spike (which is reasonable governor behavior given the amount of CPU time needed) was always PSS collection. As a result, Android is burning more power than we should be on PSS collection.
Yes, this really sucks but we are revolving around the same point. It really sucks that we burn so much time just copying the output to the userspace when the real stuff (vma walk and pte walk) has to be done anyway. AFAIR I could reduce the overhead by using more appropriate seq_* functions but maybe we can do even better.
The other issue (and why I'm less sure about improving smaps as a long-term solution) is that the number of VMAs per process has increased significantly from release to release. After trying to figure out why we were seeing these 200-300ms PSS collection times on Android O but had not noticed it in previous versions, we found that the number of VMAs in the main system process increased by 50% from Android N to Android O (from ~1800 to ~2700) and varying increases in every userspace process. Android M to N also had an increase in the number of VMAs, although not as much. I'm not sure why this is increasing so much over time, but thinking about ASLR and ways to make ASLR better, I expect that this will continue to increase going forward. I would not be surprised if we hit 5000 VMAs on the main Android process (system_server) by 2020.
The thing is, however, that the larger amount of VMAs will also mean more work on the kernel side. The data collection has to be done anyway.
If we assume that the number of VMAs is going to increase over time, then doing anything we can do to reduce the overhead of each VMA during PSS collection seems like the right way to go, and that means outputting an aggregate statistic (to avoid whatever overhead there is per line in writing smaps and in reading each line from userspace). Also, Dan sent me some numbers from his benchmark measuring PSS on system_server (the big Android process) using smaps vs smaps_rollup: using smaps: iterations:1000 pid:1163 pss:220023808 0m29.46s real 0m08.28s user 0m20.98s system using smaps_rollup: iterations:1000 pid:1163 pss:220702720 0m04.39s real 0m00.03s user 0m04.31s system
I would assume we would do all we can to reduce this kernel->user overhead first before considering a new user visible file. I haven't seen any attempts except from the low hanging fruid I have tried. -- Michal Hocko SUSE Labs