Thread (160 messages) 160 messages, 20 authors, 2023-05-08

Re: [PATCH 00/40] Memory allocation profiling

From: Michal Hocko <mhocko@suse.com>
Date: 2023-05-03 07:27:09
Also in: cgroups, linux-arch, linux-doc, linux-fsdevel, linux-iommu, linux-mm, lkml

On Mon 01-05-23 09:54:10, Suren Baghdasaryan wrote:
Memory allocation profiling infrastructure provides a low overhead
mechanism to make all kernel allocations in the system visible. It can be
used to monitor memory usage, track memory hotspots, detect memory leaks,
identify memory regressions.

To keep the overhead to the minimum, we record only allocation sizes for
every allocation in the codebase. With that information, if users are
interested in more detailed context for a specific allocation, they can
enable in-depth context tracking, which includes capturing the pid, tgid,
task name, allocation size, timestamp and call stack for every allocation
at the specified code location.
[...]
Implementation utilizes a more generic concept of code tagging, introduced
as part of this patchset. Code tag is a structure identifying a specific
location in the source code which is generated at compile time and can be
embedded in an application-specific structure. A number of applications
for code tagging have been presented in the original RFC [1].
Code tagging uses the old trick of "define a special elf section for
objects of a given type so that we can iterate over them at runtime" and
creates a proper library for it. 

To profile memory allocations, we instrument page, slab and percpu
allocators to record total memory allocated in the associated code tag at
every allocation in the codebase. Every time an allocation is performed by
an instrumented allocator, the code tag at that location increments its
counter by allocation size. Every time the memory is freed the counter is
decremented. To decrement the counter upon freeing, allocated object needs
a reference to its code tag. Page allocators use page_ext to record this
reference while slab allocators use memcg_data (renamed into more generic
slabobj_ext) of the slab page.
[...]
[1] https://lore.kernel.org/all/20220830214919.53220-1-surenb@google.com/ (local)
[...]
 70 files changed, 2765 insertions(+), 554 deletions(-)
Sorry for cutting the cover considerably but I believe I have quoted the
most important/interesting parts here. The approach is not fundamentally
different from the previous version [1] and there was a significant
discussion around this approach. The cover letter doesn't summarize nor
deal with concerns expressed previous AFAICS. So let me bring those up
back. At least those I find the most important:
- This is a big change and it adds a significant maintenance burden
  because each allocation entry point needs to be handled specifically.
  The cost will grow with the intended coverage especially there when
  allocation is hidden in a library code.
- It has been brought up that this is duplicating functionality already
  available via existing tracing infrastructure. You should make it very
  clear why that is not suitable for the job
- We already have page_owner infrastructure that provides allocation
  tracking data. Why it cannot be used/extended?

Thanks!
-- 
Michal Hocko
SUSE Labs
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help