Re: [PATCH 00/40] Memory allocation profiling
From: Michal Hocko <mhocko@suse.com>
Date: 2023-05-03 07:27:09
Also in:
cgroups, linux-arch, linux-doc, linux-fsdevel, linux-iommu, linux-mm, lkml
On Mon 01-05-23 09:54:10, Suren Baghdasaryan wrote:
Memory allocation profiling infrastructure provides a low overhead mechanism to make all kernel allocations in the system visible. It can be used to monitor memory usage, track memory hotspots, detect memory leaks, identify memory regressions. To keep the overhead to the minimum, we record only allocation sizes for every allocation in the codebase. With that information, if users are interested in more detailed context for a specific allocation, they can enable in-depth context tracking, which includes capturing the pid, tgid, task name, allocation size, timestamp and call stack for every allocation at the specified code location.
[...]
Implementation utilizes a more generic concept of code tagging, introduced as part of this patchset. Code tag is a structure identifying a specific location in the source code which is generated at compile time and can be embedded in an application-specific structure. A number of applications for code tagging have been presented in the original RFC [1]. Code tagging uses the old trick of "define a special elf section for objects of a given type so that we can iterate over them at runtime" and creates a proper library for it. To profile memory allocations, we instrument page, slab and percpu allocators to record total memory allocated in the associated code tag at every allocation in the codebase. Every time an allocation is performed by an instrumented allocator, the code tag at that location increments its counter by allocation size. Every time the memory is freed the counter is decremented. To decrement the counter upon freeing, allocated object needs a reference to its code tag. Page allocators use page_ext to record this reference while slab allocators use memcg_data (renamed into more generic slabobj_ext) of the slab page.
[...]
[1] https://lore.kernel.org/all/20220830214919.53220-1-surenb@google.com/ (local)
[...]
70 files changed, 2765 insertions(+), 554 deletions(-)
Sorry for cutting the cover considerably but I believe I have quoted the most important/interesting parts here. The approach is not fundamentally different from the previous version [1] and there was a significant discussion around this approach. The cover letter doesn't summarize nor deal with concerns expressed previous AFAICS. So let me bring those up back. At least those I find the most important: - This is a big change and it adds a significant maintenance burden because each allocation entry point needs to be handled specifically. The cost will grow with the intended coverage especially there when allocation is hidden in a library code. - It has been brought up that this is duplicating functionality already available via existing tracing infrastructure. You should make it very clear why that is not suitable for the job - We already have page_owner infrastructure that provides allocation tracking data. Why it cannot be used/extended? Thanks! -- Michal Hocko SUSE Labs