Re: [External] Re: [PATCH v2 17/21] mm: list_lru: replace linear array with xarray
From: Muchun Song <hidden>
Date: 2021-05-28 03:44:12
Also in:
linux-fsdevel, linux-nfs, lkml
On Thu, May 27, 2021 at 8:08 PM Matthew Wilcox [off-list ref] wrote:
On Thu, May 27, 2021 at 02:21:44PM +0800, Muchun Song wrote:quoted
If we run 10k containers in the system, the size of the list_lru_memcg->lrus can be ~96KB per list_lru. When we decrease the number containers, the size of the array will not be shrinked. It is not scalable. The xarray is a good choice for this case. We can save a lot of memory when there are tens of thousands continers in the system. If we use xarray, we also can remove the logic code of resizing array, which can simplify the code.I am all for this, in concept. Some thoughts below ...quoted
@@ -56,10 +51,8 @@ struct list_lru { #ifdef CONFIG_MEMCG_KMEM struct list_head list; int shrinker_id; - /* protects ->memcg_lrus->lrus[i] */ - spinlock_t lock; /* for cgroup aware lrus points to per cgroup lists, otherwise NULL */ - struct list_lru_memcg __rcu *memcg_lrus; + struct xarray *xa; #endifNormally, we embed an xarray in its containing structure instead of allocating it. It's only a pointer, int and spinlock, so generally 16 bytes, as opposed to the 8 bytes for the pointer and a 16 byte allocation. There is a minor wrinkle in that currently 'NULL' is used to indicate "is not cgroup aware". Maybe there's another way to indicate that?
Sure. I can drop patch 8 in this series. In that case, we can use ->memcg_aware to indicate that.
quoted
@@ -51,22 +51,12 @@ static int lru_shrinker_id(struct list_lru *lru) static inline struct list_lru_one * list_lru_from_memcg_idx(struct list_lru *lru, int nid, int idx) { - struct list_lru_memcg *memcg_lrus; - struct list_lru_node *nlru = &lru->node[nid]; + if (list_lru_memcg_aware(lru) && idx >= 0) { + struct list_lru_per_memcg *mlru = xa_load(lru->xa, idx); - /* - * Either lock or RCU protects the array of per cgroup lists - * from relocation (see memcg_update_list_lru). - */ - memcg_lrus = rcu_dereference_check(lru->memcg_lrus, - lockdep_is_held(&nlru->lock)); - if (memcg_lrus && idx >= 0) { - struct list_lru_per_memcg *mlru; - - mlru = rcu_dereference_check(memcg_lrus->lrus[idx], true); return mlru ? &mlru->nodes[nid] : NULL; } - return &nlru->lru; + return &lru->node[nid].lru; }... perhaps we move the xarray out from under the #ifdef and use index 0 for non-memcg-aware lrus? The XArray is specially optimised for arrays which only have one entry at 0.
Sounds like a good idea. I can do a try.
quoted
int list_lru_memcg_alloc(struct list_lru *lru, struct mem_cgroup *memcg, gfp_t gfp) { + XA_STATE(xas, lru->xa, 0); unsigned long flags; - struct list_lru_memcg *memcg_lrus; - int i; + int i, ret = 0; struct list_lru_memcg_table { struct list_lru_per_memcg *mlru;@@ -601,22 +522,45 @@ int list_lru_memcg_alloc(struct list_lru *lru, struct mem_cgroup *memcg, gfp_t g } } - spin_lock_irqsave(&lru->lock, flags); - memcg_lrus = rcu_dereference_protected(lru->memcg_lrus, true); + xas_lock_irqsave(&xas, flags); while (i--) { int index = memcg_cache_id(table[i].memcg); struct list_lru_per_memcg *mlru = table[i].mlru; - if (index < 0 || rcu_dereference_protected(memcg_lrus->lrus[index], true)) + xas_set(&xas, index); +retry: + if (unlikely(index < 0 || ret || xas_load(&xas))) { kfree(mlru); - else - rcu_assign_pointer(memcg_lrus->lrus[index], mlru); + } else { + ret = xa_err(xas_store(&xas, mlru));This is mixing advanced and normal XArray concepts ... sorry to have confused you. I think what you meant to do here was: xas_store(&xas, mlru); ret = xas_error(&xas);
Sure. Thanks for pointing it out. It's my bad usage.
Or you can avoid introducing 'ret' at all, and keep your errors in the xa_state. You're kind of mirroring the xa_state errors into 'ret' anyway, so that seems easier to understand?
Make sense. I will do this in the next version. Thanks for your all suggestions.
quoted
- memcg_id = memcg_alloc_cache_id(); + memcg_id = ida_simple_get(&memcg_cache_ida, 0, MEMCG_CACHES_MAX_SIZE, + GFP_KERNEL);memcg_id = ida_alloc_max(&memcg_cache_ida, MEMCG_CACHES_MAX_SIZE - 1, GFP_KERNEL); ... although i think there's actually a fencepost error, and this really should be MEMCG_CACHES_MAX_SIZE.
Totally agree. I have fixed this issue in patch 19.
quoted
objcg = obj_cgroup_alloc(); if (!objcg) { - memcg_free_cache_id(memcg_id); + ida_simple_remove(&memcg_cache_ida, memcg_id);ida_free(&memcg_cache_ida, memcg_id);
I Will update to this new API.