Re: [External] Re: [PATCH 0/9] Shrink the list lru size on memory cgroup removal

From: Dave Chinner <david@fromorbit.com>
Date: 2021-05-05 01:13:40
Also in: linux-fsdevel, lkml

On Mon, May 03, 2021 at 02:33:21PM +0800, Muchun Song wrote:

On Mon, May 3, 2021 at 7:58 AM Dave Chinner [off-list ref] wrote:

quoted

If the user wants to insert the allocated object to its lru list in
the feature. The
user should use list_lru_kmem_cache_alloc() instead of kmem_cache_alloc().
I have looked at the code closely. There are 3 different kmem_caches that
need to use this new API to allocate memory. They are inode_cachep,
dentry_cache and radix_tree_node_cachep. I think that it is easy to migrate.

It might work, but I think you may have overlooked the complexity
of inode allocation for filesystems. i.e.  alloc_inode() calls out
to filesystem allocation functions more often than it allocates
directly from the inode_cachep.  i.e.  Most filesystems provide
their own ->alloc_inode superblock operation, and they allocate
inodes out of their own specific slab caches, not the inode_cachep.

I didn't realize this before. You are right. Most filesystems
have their own kmem_cache instead of inode_cachep.
We need a lot of filesystems special to be changed.
Thanks for your reminder.

quoted

And then you have filesystems like XFS, where alloc_inode() will
never be called, and implement ->alloc_inode as:

/* Catch misguided souls that try to use this interface on XFS */
STATIC struct inode *
xfs_fs_alloc_inode(
        struct super_block      *sb)
{
        BUG();
        return NULL;
}

Because all the inode caching and allocation is internal to XFS and
VFS inode management interfaces are not used.

So I suspect that an external wrapper function is not the way to go
here - either internalising the LRU management into the slab
allocation or adding the memcg code to alloc_inode() and filesystem
specific routines would make a lot more sense to me.

Sure. If we introduce kmem_cache_alloc_lru, all filesystems
need to migrate to kmem_cache_alloc_lru. I cannot figure out
an approach that does not need to change filesystems code.

Right, I don't think there's a way to avoid changing all the
filesystem code if we are touching the cache allocation routines.
However, if we hide it all inside the allocation routine, then
the changes to each filesystem is effectively just a 1-liner like:

-	inode = kmem_cache_alloc(inode_cache, GFP_NOFS);
+	inode = kmem_cache_alloc_lru(inode_cache, sb->s_inode_lru, GFP_NOFS);

Or perhaps, define a generic wrapper function like:

static inline void *
alloc_inode_sb(struct superblock *sb, struct kmem_cache *cache, gfp_flags_t gfp)
{
	return kmem_cache_alloc_lru(cache, sb->s_inode_lru, gfp);
}

And then each filesystem ends up with:

-	inode = kmem_cache_alloc(inode_cache, GFP_NOFS);
+	inode = alloc_inode_sb(sb, inode_cache, GFP_NOFS);

so that all the superblock LRU stuff is also hidden from the
filesystems...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help