Re: [PATCH v2 00/39] Memory allocation profiling

[PATCH v2 00/39] Memory allocation profiling · Suren Baghdasaryan <surenb@google.com> · 2023-10-24
[PATCH v2 01/39] lib/string_helpers: Add flags param to string_get_size() · Suren Baghdasaryan <surenb@google.com> · 2023-10-24
[PATCH v2 03/39] fs: Convert alloc_inode_sb() to a macro · Suren Baghdasaryan <surenb@google.com> · 2023-10-24
[PATCH v2 02/39] scripts/kallysms: Always include __start and __stop symbols · Suren Baghdasaryan <surenb@google.com> · 2023-10-24
[PATCH v2 04/39] nodemask: Split out include/linux/nodemask_types.h · Suren Baghdasaryan <surenb@google.com> · 2023-10-24
[PATCH v2 05/39] prandom: Remove unused include · Suren Baghdasaryan <surenb@google.com> · 2023-10-24
[PATCH v2 06/39] mm: enumerate all gfp flags · Suren Baghdasaryan <surenb@google.com> · 2023-10-24
Re: [PATCH v2 06/39] mm: enumerate all gfp flags · Petr Tesařík <hidden> · 2023-10-25
Re: [PATCH v2 06/39] mm: enumerate all gfp flags · Suren Baghdasaryan <surenb@google.com> · 2023-10-25
Re: [PATCH v2 06/39] mm: enumerate all gfp flags · Petr Tesařík <hidden> · 2023-10-28
[PATCH v2 07/39] mm: introduce slabobj_ext to support slab object extensions · Suren Baghdasaryan <surenb@google.com> · 2023-10-24
[PATCH v2 08/39] mm: introduce __GFP_NO_OBJ_EXT flag to selectively prevent slabobj_ext creation · Suren Baghdasaryan <surenb@google.com> · 2023-10-24
[PATCH v2 09/39] mm/slab: introduce SLAB_NO_OBJ_EXT to avoid obj_ext creation · Suren Baghdasaryan <surenb@google.com> · 2023-10-24
[PATCH v2 10/39] mm: prevent slabobj_ext allocations for slabobj_ext and kmem_cache objects · Suren Baghdasaryan <surenb@google.com> · 2023-10-24
[PATCH v2 11/39] slab: objext: introduce objext_flags as extension to page_memcg_data_flags · Suren Baghdasaryan <surenb@google.com> · 2023-10-24
[PATCH v2 12/39] lib: code tagging framework · Suren Baghdasaryan <surenb@google.com> · 2023-10-24
[PATCH v2 13/39] lib: code tagging module support · Suren Baghdasaryan <surenb@google.com> · 2023-10-24
[PATCH v2 14/39] lib: prevent module unloading if memory is not freed · Suren Baghdasaryan <surenb@google.com> · 2023-10-24
[PATCH v2 15/39] lib: add allocation tagging support for memory allocation profiling · Suren Baghdasaryan <surenb@google.com> · 2023-10-24
[PATCH v2 16/39] lib: introduce support for page allocation tagging · Suren Baghdasaryan <surenb@google.com> · 2023-10-24
[PATCH v2 17/39] change alloc_pages name in dma_map_ops to avoid name conflicts · Suren Baghdasaryan <surenb@google.com> · 2023-10-24
[PATCH v2 18/39] change alloc_pages name in ivpu_bo_ops to avoid conflicts · Suren Baghdasaryan <surenb@google.com> · 2023-10-24
[PATCH v2 19/39] mm: enable page allocation tagging · Suren Baghdasaryan <surenb@google.com> · 2023-10-24
[PATCH v2 20/39] mm: create new codetag references during page splitting · Suren Baghdasaryan <surenb@google.com> · 2023-10-24
[PATCH v2 21/39] mm/page_ext: enable early_page_ext when CONFIG_MEM_ALLOC_PROFILING_DEBUG=y · Suren Baghdasaryan <surenb@google.com> · 2023-10-24
[PATCH v2 22/39] lib: add codetag reference into slabobj_ext · Suren Baghdasaryan <surenb@google.com> · 2023-10-24
[PATCH v2 23/39] mm/slab: add allocation accounting into slab allocation and free paths · Suren Baghdasaryan <surenb@google.com> · 2023-10-24
[PATCH v2 24/39] mm/slab: enable slab allocation tagging for kmalloc and friends · Suren Baghdasaryan <surenb@google.com> · 2023-10-24
[PATCH v2 25/39] mm/slub: Mark slab_free_freelist_hook() __always_inline · Suren Baghdasaryan <surenb@google.com> · 2023-10-24
[PATCH v2 26/39] mempool: Hook up to memory allocation profiling · Suren Baghdasaryan <surenb@google.com> · 2023-10-24
[PATCH v2 27/39] xfs: Memory allocation profiling fixups · Suren Baghdasaryan <surenb@google.com> · 2023-10-24
[PATCH v2 28/39] timekeeping: Fix a circular include dependency · Suren Baghdasaryan <surenb@google.com> · 2023-10-24
Re: [PATCH v2 28/39] timekeeping: Fix a circular include dependency · Thomas Gleixner <hidden> · 2023-10-25
Re: [PATCH v2 28/39] timekeeping: Fix a circular include dependency · Suren Baghdasaryan <surenb@google.com> · 2023-10-26
Re: [PATCH v2 28/39] timekeeping: Fix a circular include dependency · Thomas Gleixner <hidden> · 2023-10-26
Re: [PATCH v2 28/39] timekeeping: Fix a circular include dependency · Kent Overstreet <kent.overstreet@linux.dev> · 2023-10-26
Re: [PATCH v2 28/39] timekeeping: Fix a circular include dependency · "Arnd Bergmann" <arnd@arndb.de> · 2023-10-27
Re: [PATCH v2 28/39] timekeeping: Fix a circular include dependency · Nick Desaulniers <hidden> · 2023-10-27
[PATCH v2 29/39] mm: percpu: Introduce pcpuobj_ext · Suren Baghdasaryan <surenb@google.com> · 2023-10-24
[PATCH v2 30/39] mm: percpu: Add codetag reference into pcpuobj_ext · Suren Baghdasaryan <surenb@google.com> · 2023-10-24
[PATCH v2 31/39] mm: percpu: enable per-cpu allocation tagging · Suren Baghdasaryan <surenb@google.com> · 2023-10-24
[PATCH v2 32/39] arm64: Fix circular header dependency · Suren Baghdasaryan <surenb@google.com> · 2023-10-24
[PATCH v2 34/39] rhashtable: Plumb through alloc tag · Suren Baghdasaryan <surenb@google.com> · 2023-10-24
[PATCH v2 33/39] mm: vmalloc: Enable memory allocation profiling · Suren Baghdasaryan <surenb@google.com> · 2023-10-24
[PATCH v2 35/39] lib: add memory allocations report in show_mem() · Suren Baghdasaryan <surenb@google.com> · 2023-10-24
[PATCH v2 36/39] codetag: debug: skip objext checking when it's for objext itself · Suren Baghdasaryan <surenb@google.com> · 2023-10-24
[PATCH v2 37/39] codetag: debug: mark codetags for reserved pages as empty · Suren Baghdasaryan <surenb@google.com> · 2023-10-24
[PATCH v2 38/39] codetag: debug: introduce OBJEXTS_ALLOC_FAIL to mark failed slab_ext allocations · Suren Baghdasaryan <surenb@google.com> · 2023-10-24
[PATCH v2 39/39] MAINTAINERS: Add entries for code tagging and memory allocation profiling · Suren Baghdasaryan <surenb@google.com> · 2023-10-24
Re: [PATCH v2 00/39] Memory allocation profiling · Roman Gushchin <roman.gushchin@linux.dev> · 2023-10-24
Re: [PATCH v2 00/39] Memory allocation profiling · Suren Baghdasaryan <surenb@google.com> · 2023-10-24

From: Suren Baghdasaryan <surenb@google.com>
Date: 2023-10-24 18:39:08
Also in: cgroups, linux-arch, linux-doc, linux-fsdevel, linux-iommu, linux-mm, lkml

On Tue, Oct 24, 2023 at 11:29 AM Roman Gushchin
[off-list ref] wrote:

On Tue, Oct 24, 2023 at 06:45:57AM -0700, Suren Baghdasaryan wrote:

quoted

Updates since the last version [1]
- Simplified allocation tagging macros;
- Runtime enable/disable sysctl switch (/proc/sys/vm/mem_profiling)
instead of kernel command-line option;
- CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT to select default enable state;
- Changed the user-facing API from debugfs to procfs (/proc/allocinfo);
- Removed context capture support to make patch incremental;
- Renamed uninstrumented allocation functions to use _noprof suffix;
- Added __GFP_LAST_BIT to make the code cleaner;
- Removed lazy per-cpu counters; it turned out the memory savings was
minimal and not worth the performance impact;

Hello Suren,

quoted

Performance overhead:
To evaluate performance we implemented an in-kernel test executing
multiple get_free_page/free_page and kmalloc/kfree calls with allocation
sizes growing from 8 to 240 bytes with CPU frequency set to max and CPU
affinity set to a specific CPU to minimize the noise. Below is performance
comparison between the baseline kernel, profiling when enabled, profiling
when disabled and (for comparison purposes) baseline with
CONFIG_MEMCG_KMEM enabled and allocations using __GFP_ACCOUNT:

                        kmalloc                 pgalloc
(1 baseline)            12.041s                 49.190s
(2 default disabled)    14.970s (+24.33%)       49.684s (+1.00%)
(3 default enabled)     16.859s (+40.01%)       56.287s (+14.43%)
(4 runtime enabled)     16.983s (+41.04%)       55.760s (+13.36%)
(5 memcg)               33.831s (+180.96%)      51.433s (+4.56%)

some recent changes [1] to the kmem accounting should have made it quite a bit
faster. Would be great if you can provide new numbers for the comparison.
Maybe with the next revision?

And btw thank you (and Kent): your numbers inspired me to do this kmemcg
performance work. I expect it still to be ~twice more expensive than your
stuff because on the memcg side we handle separately charge and statistics,
but hopefully the difference will be lower.

Yes, I saw them! Well done! I'll definitely update my numbers once the
patches land in their final form.

Thank you!

Thank you for the optimizations!

[1]:
  patches from next tree, so no stable hashes:
    mm: kmem: reimplement get_obj_cgroup_from_current()
    percpu: scoped objcg protection
    mm: kmem: scoped objcg protection
    mm: kmem: make memcg keep a reference to the original objcg
    mm: kmem: add direct objcg pointer to task_struct
    mm: kmem: optimize get_obj_cgroup_from_current()

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help