Re: [PATCH v3 00/76] Optimize list lru memory consumption

[PATCH v3 00/76] Optimize list lru memory consumption · Muchun Song <hidden> · 2021-09-14
[PATCH v3 01/76] mm: list_lru: fix the return value of list_lru_count_one() · Muchun Song <hidden> · 2021-09-14
[PATCH v3 02/76] mm: memcontrol: remove kmemcg_id reparenting · Muchun Song <hidden> · 2021-09-14
[PATCH v3 03/76] mm: memcontrol: remove the kmem states · Muchun Song <hidden> · 2021-09-14
[PATCH v3 04/76] mm: memcontrol: move memcg_online_kmem() to mem_cgroup_css_online() · Muchun Song <hidden> · 2021-09-14
[PATCH v3 05/76] mm: list_lru: remove holding lru lock · Muchun Song <hidden> · 2021-09-14
[PATCH v3 06/76] mm: list_lru: only add memcg-aware lrus to the global lru list · Muchun Song <hidden> · 2021-09-14
[PATCH v3 07/76] mm: list_lru: optimize memory consumption of arrays · Muchun Song <hidden> · 2021-09-14
[PATCH v3 08/76] mm: introduce kmem_cache_alloc_lru · Muchun Song <hidden> · 2021-09-14
[PATCH v3 09/76] fs: introduce alloc_inode_sb() to allocate filesystems specific inode · Muchun Song <hidden> · 2021-09-14
[PATCH v3 10/76] dax: allocate inode by using alloc_inode_sb() · Muchun Song <hidden> · 2021-09-14
[PATCH v3 11/76] 9p: allocate inode by using alloc_inode_sb() · Muchun Song <hidden> · 2021-09-14
[PATCH v3 12/76] adfs: allocate inode by using alloc_inode_sb() · Muchun Song <hidden> · 2021-09-14
[PATCH v3 13/76] affs: allocate inode by using alloc_inode_sb() · Muchun Song <hidden> · 2021-09-14
[PATCH v3 14/76] afs: allocate inode by using alloc_inode_sb() · Muchun Song <hidden> · 2021-09-14
[PATCH v3 15/76] befs: allocate inode by using alloc_inode_sb() · Muchun Song <hidden> · 2021-09-14
[PATCH v3 16/76] bfs: allocate inode by using alloc_inode_sb() · Muchun Song <hidden> · 2021-09-14
[PATCH v3 17/76] block: allocate inode by using alloc_inode_sb() · Muchun Song <hidden> · 2021-09-14
[PATCH v3 18/76] btrfs: allocate inode by using alloc_inode_sb() · Muchun Song <hidden> · 2021-09-14
[PATCH v3 19/76] ceph: allocate inode by using alloc_inode_sb() · Muchun Song <hidden> · 2021-09-14
[PATCH v3 20/76] cifs: allocate inode by using alloc_inode_sb() · Muchun Song <hidden> · 2021-09-14
[PATCH v3 21/76] coda: allocate inode by using alloc_inode_sb() · Muchun Song <hidden> · 2021-09-14
[PATCH v3 22/76] ecryptfs: allocate inode by using alloc_inode_sb() · Muchun Song <hidden> · 2021-09-14
[PATCH v3 23/76] efs: allocate inode by using alloc_inode_sb() · Muchun Song <hidden> · 2021-09-14
[PATCH v3 24/76] erofs: allocate inode by using alloc_inode_sb() · Muchun Song <hidden> · 2021-09-14
[PATCH v3 25/76] exfat: allocate inode by using alloc_inode_sb() · Muchun Song <hidden> · 2021-09-14
[PATCH v3 26/76] ext2: allocate inode by using alloc_inode_sb() · Muchun Song <hidden> · 2021-09-14
[PATCH v3 27/76] ext4: allocate inode by using alloc_inode_sb() · Muchun Song <hidden> · 2021-09-14
[PATCH v3 28/76] fat: allocate inode by using alloc_inode_sb() · Muchun Song <hidden> · 2021-09-14
[PATCH v3 29/76] freevxfs: allocate inode by using alloc_inode_sb() · Muchun Song <hidden> · 2021-09-14
[PATCH v3 30/76] fuse: allocate inode by using alloc_inode_sb() · Muchun Song <hidden> · 2021-09-14
[PATCH v3 31/76] gfs2: allocate inode by using alloc_inode_sb() · Muchun Song <hidden> · 2021-09-14
[PATCH v3 32/76] hfs: allocate inode by using alloc_inode_sb() · Muchun Song <hidden> · 2021-09-14
[PATCH v3 33/76] hfsplus: allocate inode by using alloc_inode_sb() · Muchun Song <hidden> · 2021-09-14
[PATCH v3 34/76] hostfs: allocate inode by using alloc_inode_sb() · Muchun Song <hidden> · 2021-09-14
[PATCH v3 35/76] hpfs: allocate inode by using alloc_inode_sb() · Muchun Song <hidden> · 2021-09-14
[PATCH v3 36/76] hugetlbfs: allocate inode by using alloc_inode_sb() · Muchun Song <hidden> · 2021-09-14
[PATCH v3 37/76] isofs: allocate inode by using alloc_inode_sb() · Muchun Song <hidden> · 2021-09-14
[PATCH v3 38/76] jffs2: allocate inode by using alloc_inode_sb() · Muchun Song <hidden> · 2021-09-14
[PATCH v3 39/76] jfs: allocate inode by using alloc_inode_sb() · Muchun Song <hidden> · 2021-09-14
[PATCH v3 40/76] minix: allocate inode by using alloc_inode_sb() · Muchun Song <hidden> · 2021-09-14
[PATCH v3 41/76] nfs: allocate inode by using alloc_inode_sb() · Muchun Song <hidden> · 2021-09-14
[PATCH v3 42/76] nilfs2: allocate inode by using alloc_inode_sb() · Muchun Song <hidden> · 2021-09-14
[PATCH v3 46/76] orangefs: allocate inode by using alloc_inode_sb() · Muchun Song <hidden> · 2021-09-14
[PATCH v3 47/76] overlayfs: allocate inode by using alloc_inode_sb() · Muchun Song <hidden> · 2021-09-14
[PATCH v3 48/76] proc: allocate inode by using alloc_inode_sb() · Muchun Song <hidden> · 2021-09-14
[PATCH v3 49/76] qnx4: allocate inode by using alloc_inode_sb() · Muchun Song <hidden> · 2021-09-14
[PATCH v3 50/76] qnx6: allocate inode by using alloc_inode_sb() · Muchun Song <hidden> · 2021-09-14
[PATCH v3 51/76] reiserfs: allocate inode by using alloc_inode_sb() · Muchun Song <hidden> · 2021-09-14
[PATCH v3 52/76] romfs: allocate inode by using alloc_inode_sb() · Muchun Song <hidden> · 2021-09-14
[PATCH v3 53/76] squashfs: allocate inode by using alloc_inode_sb() · Muchun Song <hidden> · 2021-09-14
[PATCH v3 54/76] sysv: allocate inode by using alloc_inode_sb() · Muchun Song <hidden> · 2021-09-14
[PATCH v3 55/76] ubifs: allocate inode by using alloc_inode_sb() · Muchun Song <hidden> · 2021-09-14
[PATCH v3 43/76] ntfs: allocate inode by using alloc_inode_sb() · Muchun Song <hidden> · 2021-09-14
[PATCH v3 44/76] ocfs2: allocate inode by using alloc_inode_sb() · Muchun Song <hidden> · 2021-09-14
[PATCH v3 45/76] openpromfs: allocate inode by using alloc_inode_sb() · Muchun Song <hidden> · 2021-09-14
[PATCH v3 56/76] udf: allocate inode by using alloc_inode_sb() · Muchun Song <hidden> · 2021-09-14
[PATCH v3 57/76] ufs: allocate inode by using alloc_inode_sb() · Muchun Song <hidden> · 2021-09-14
[PATCH v3 58/76] vboxsf: allocate inode by using alloc_inode_sb() · Muchun Song <hidden> · 2021-09-14
[PATCH v3 59/76] xfs: allocate inode by using alloc_inode_sb() · Muchun Song <hidden> · 2021-09-14
[PATCH v3 60/76] zonefs: allocate inode by using alloc_inode_sb() · Muchun Song <hidden> · 2021-09-14
[PATCH v3 61/76] ipc: allocate inode by using alloc_inode_sb() · Muchun Song <hidden> · 2021-09-14
[PATCH v3 62/76] shmem: allocate inode by using alloc_inode_sb() · Muchun Song <hidden> · 2021-09-14
[PATCH v3 63/76] net: allocate inode by using alloc_inode_sb() · Muchun Song <hidden> · 2021-09-14
[PATCH v3 64/76] rpc: allocate inode by using alloc_inode_sb() · Muchun Song <hidden> · 2021-09-14
[PATCH v3 65/76] f2fs: allocate inode by using alloc_inode_sb() · Muchun Song <hidden> · 2021-09-14
[PATCH v3 66/76] nfs42: use a specific kmem_cache to allocate nfs4_xattr_entry · Muchun Song <hidden> · 2021-09-14
[PATCH v3 67/76] mm: dcache: use kmem_cache_alloc_lru() to allocate dentry · Muchun Song <hidden> · 2021-09-14
[PATCH v3 68/76] xarray: use kmem_cache_alloc_lru to allocate xa_node · Muchun Song <hidden> · 2021-09-14
[PATCH v3 69/76] mm: workingset: use xas_set_lru() to pass shadow_nodes · Muchun Song <hidden> · 2021-09-14
[PATCH v3 70/76] mm: list_lru: allocate list_lru_one only when needed · Muchun Song <hidden> · 2021-09-14
[PATCH v3 71/76] mm: list_lru: rename memcg_drain_all_list_lrus to memcg_reparent_list_lrus · Muchun Song <hidden> · 2021-09-14
[PATCH v3 72/76] mm: list_lru: replace linear array with xarray · Muchun Song <hidden> · 2021-09-14
[PATCH v3 73/76] mm: memcontrol: reuse memory cgroup ID for kmem ID · Muchun Song <hidden> · 2021-09-14
[PATCH v3 74/76] mm: memcontrol: fix cannot alloc the maximum memcg ID · Muchun Song <hidden> · 2021-09-14
[PATCH v3 75/76] mm: list_lru: rename list_lru_per_memcg to list_lru_memcg · Muchun Song <hidden> · 2021-09-14
[PATCH v3 76/76] mm: memcontrol: rename memcg_cache_id to memcg_kmem_id · Muchun Song <hidden> · 2021-09-14
Re: [PATCH v3 00/76] Optimize list lru memory consumption · "Theodore Ts'o" <tytso@mit.edu> · 2021-09-14
Re: [PATCH v3 00/76] Optimize list lru memory consumption · Muchun Song <hidden> · 2021-09-15
Re: [PATCH v3 00/76] Optimize list lru memory consumption · Kari Argillander <hidden> · 2021-09-18
Re: [PATCH v3 00/76] Optimize list lru memory consumption · Muchun Song <hidden> · 2021-09-18

From: Muchun Song <hidden>
Date: 2021-09-18 08:00:05
Also in: linux-fsdevel, linux-mm, lkml

On Sat, Sep 18, 2021 at 2:56 PM Kari Argillander
[off-list ref] wrote:

On Tue, Sep 14, 2021 at 03:28:22PM +0800, Muchun Song wrote:

quoted

We introduced alloc_inode_sb() in previous version 2, which sets up the
inode reclaim context properly, to allocate filesystems specific inode.
So we have to convert to new API for all filesystems, which is done in
one patch. Some filesystems are easy to convert (just replace
kmem_cache_alloc() to alloc_inode_sb()), while other filesystems need to
do more work. In order to make it easy for maintainers of different
filesystems to review their own maintained part, I split the patch into
patches which are per-filesystem in this version. I am not sure if this
is a good idea, because there is going to be more commits.

In our server, we found a suspected memory leak problem. The kmalloc-32
consumes more than 6GB of memory. Other kmem_caches consume less than 2GB
memory.

After our in-depth analysis, the memory consumption of kmalloc-32 slab
cache is the cause of list_lru_one allocation.

  crash> p memcg_nr_cache_ids
  memcg_nr_cache_ids = $2 = 24574

memcg_nr_cache_ids is very large and memory consumption of each list_lru
can be calculated with the following formula.

  num_numa_node * memcg_nr_cache_ids * 32 (kmalloc-32)

There are 4 numa nodes in our system, so each list_lru consumes ~3MB.

  crash> list super_blocks | wc -l
  952

Every mount will register 2 list lrus, one is for inode, another is for
dentry. There are 952 super_blocks. So the total memory is 952 * 2 * 3
MB (~5.6GB). But now the number of memory cgroups is less than 500. So I
guess more than 12286 memory cgroups have been created on this machine (I
do not know why there are so many cgroups, it may be a user's bug or
the user really want to do that). Because memcg_nr_cache_ids has not been
reduced to a suitable value. It leads to waste a lot of memory. If we want
to reduce memcg_nr_cache_ids, we have to *reboot* the server. This is not
what we want.

In order to reduce memcg_nr_cache_ids, I had posted a patchset [1] to do
this. But this did not fundamentally solve the problem.

We currently allocate scope for every memcg to be able to tracked on every
superblock instantiated in the system, regardless of whether that superblock
is even accessible to that memcg.

These huge memcg counts come from container hosts where memcgs are confined
to just a small subset of the total number of superblocks that instantiated
at any given point in time.

For these systems with huge container counts, list_lru does not need the
capability of tracking every memcg on every superblock.

What it comes down to is that the list_lru is only needed for a given memcg
if that memcg is instatiating and freeing objects on a given list_lru.

As Dave said, "Which makes me think we should be moving more towards 'add the
memcg to the list_lru at the first insert' model rather than 'instantiate
all at memcg init time just in case'."

This patchset aims to optimize the list lru memory consumption from different
aspects.

Patch 1-6 are code simplification.
Patch 7 converts the array from per-memcg per-node to per-memcg
Patch 8 introduces kmem_cache_alloc_lru()
Patch 9 introduces alloc_inode_sb()
Patch 10-66 convert all filesystems to alloc_inode_sb() respectively.

There is now days also ntfs3. If you do not plan to convert this please
CC me atleast so that I can do it when these lands.

  Argillander

Wow, a new filesystem. I didn't notice it before. I'll cover it
in the next version and Cc you if you can do a review.
Thanks for your reminder.

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help