Thread (20 messages) 20 messages, 5 authors, 2021-10-01

Re: [PATCH v9 2/3] mm: add a field to store names for private anonymous memory

From: Suren Baghdasaryan <surenb@google.com>
Date: 2021-09-30 18:56:26
Also in: linux-fsdevel, linux-mm, lkml

On Wed, Sep 8, 2021 at 9:05 PM Suren Baghdasaryan [off-list ref] wrote:
On Mon, Sep 6, 2021 at 9:57 AM Matthew Wilcox [off-list ref] wrote:
quoted
On Thu, Sep 02, 2021 at 04:18:12PM -0700, Suren Baghdasaryan wrote:
quoted
On Android we heavily use a set of tools that use an extended version of
the logic covered in Documentation/vm/pagemap.txt to walk all pages mapped
in userspace and slice their usage by process, shared (COW) vs.  unique
mappings, backing, etc.  This can account for real physical memory usage
even in cases like fork without exec (which Android uses heavily to share
as many private COW pages as possible between processes), Kernel SamePage
Merging, and clean zero pages.  It produces a measurement of the pages
that only exist in that process (USS, for unique), and a measurement of
the physical memory usage of that process with the cost of shared pages
being evenly split between processes that share them (PSS).

If all anonymous memory is indistinguishable then figuring out the real
physical memory usage (PSS) of each heap requires either a pagemap walking
tool that can understand the heap debugging of every layer, or for every
layer's heap debugging tools to implement the pagemap walking logic, in
which case it is hard to get a consistent view of memory across the whole
system.

Tracking the information in userspace leads to all sorts of problems.
It either needs to be stored inside the process, which means every
process has to have an API to export its current heap information upon
request, or it has to be stored externally in a filesystem that
somebody needs to clean up on crashes.  It needs to be readable while
the process is still running, so it has to have some sort of
synchronization with every layer of userspace.  Efficiently tracking
the ranges requires reimplementing something like the kernel vma
trees, and linking to it from every layer of userspace.  It requires
more memory, more syscalls, more runtime cost, and more complexity to
separately track regions that the kernel is already tracking.
I understand that the information is currently incoherent, but why is
this the right way to make it coherent?  It would seem more useful to
use something like one of the tracing mechanisms (eg ftrace, LTTng,
whatever the current hotness is in userspace tracing) for the malloc
library to log all the useful information, instead of injecting a subset
of it into the kernel for userspace to read out again.
Sorry, for the delay with the response. I'm travelling and my internet
access is very patchy.

Just to clarify, your suggestion is to require userspace to log any
allocation using ftrace or a similar mechanism and then for the system
to parse these logs to calculate the memory usage for each process?
I didn't think much in this direction but I guess logging each
allocation in the system and periodically collecting that data would
be quite expensive both from memory usage and performance POV. I'll
need to think a bit more but these are to me the obvious downsides of
this approach.
Sorry for the delay again. Now that I'm back there should not be any
more of them.
I thought more about these alternative suggestions for userspace to
record allocations but that would introduce considerable complexity
into userspace. Userspace would have to collect and consolidate this
data by some daemon, all users would have to query it for the data
(IPC or something similar), in case this daemon crashes the data would
need to be somehow recovered. So, in short, it's possible but makes
things much more complex compared to proposed in-kernel
implementation.
OTOH, the only downside of the current implementation is the
additional memory required to store anon vma names. I checked the
memory consumption on the latest Android with these patches and
because we share vma names during fork, the actual memory required to
store vma names is no more than 600kB. Even on older phones like Pixel
3 with 4GB RAM, this is less than 0.015% of total memory. IMHO, this
is an acceptable price to pay.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help