Thread (26 messages) 26 messages, 2 authors, 2021-06-16

Re: [PATCH v2 17/21] mm: list_lru: replace linear array with xarray

From: Matthew Wilcox <willy@infradead.org>
Date: 2021-05-27 12:09:08
Also in: linux-fsdevel, linux-nfs, lkml

On Thu, May 27, 2021 at 02:21:44PM +0800, Muchun Song wrote:
If we run 10k containers in the system, the size of the
list_lru_memcg->lrus can be ~96KB per list_lru. When we decrease the
number containers, the size of the array will not be shrinked. It is
not scalable. The xarray is a good choice for this case. We can save
a lot of memory when there are tens of thousands continers in the
system. If we use xarray, we also can remove the logic code of
resizing array, which can simplify the code.
I am all for this, in concept.  Some thoughts below ...
quoted hunk ↗ jump to hunk
@@ -56,10 +51,8 @@ struct list_lru {
 #ifdef CONFIG_MEMCG_KMEM
 	struct list_head	list;
 	int			shrinker_id;
-	/* protects ->memcg_lrus->lrus[i] */
-	spinlock_t		lock;
 	/* for cgroup aware lrus points to per cgroup lists, otherwise NULL */
-	struct list_lru_memcg	__rcu *memcg_lrus;
+	struct xarray		*xa;
 #endif
Normally, we embed an xarray in its containing structure instead of
allocating it.  It's only a pointer, int and spinlock, so generally
16 bytes, as opposed to the 8 bytes for the pointer and a 16 byte
allocation.  There is a minor wrinkle in that currently 'NULL' is
used to indicate "is not cgroup aware".  Maybe there's another way
to indicate that?
quoted hunk ↗ jump to hunk
@@ -51,22 +51,12 @@ static int lru_shrinker_id(struct list_lru *lru)
 static inline struct list_lru_one *
 list_lru_from_memcg_idx(struct list_lru *lru, int nid, int idx)
 {
-	struct list_lru_memcg *memcg_lrus;
-	struct list_lru_node *nlru = &lru->node[nid];
+	if (list_lru_memcg_aware(lru) && idx >= 0) {
+		struct list_lru_per_memcg *mlru = xa_load(lru->xa, idx);
 
-	/*
-	 * Either lock or RCU protects the array of per cgroup lists
-	 * from relocation (see memcg_update_list_lru).
-	 */
-	memcg_lrus = rcu_dereference_check(lru->memcg_lrus,
-					   lockdep_is_held(&nlru->lock));
-	if (memcg_lrus && idx >= 0) {
-		struct list_lru_per_memcg *mlru;
-
-		mlru = rcu_dereference_check(memcg_lrus->lrus[idx], true);
 		return mlru ? &mlru->nodes[nid] : NULL;
 	}
-	return &nlru->lru;
+	return &lru->node[nid].lru;
 }
... perhaps we move the xarray out from under the #ifdef and use index 0
for non-memcg-aware lrus?  The XArray is specially optimised for arrays
which only have one entry at 0.
quoted hunk ↗ jump to hunk
 int list_lru_memcg_alloc(struct list_lru *lru, struct mem_cgroup *memcg, gfp_t gfp)
 {
+	XA_STATE(xas, lru->xa, 0);
 	unsigned long flags;
-	struct list_lru_memcg *memcg_lrus;
-	int i;
+	int i, ret = 0;
 
 	struct list_lru_memcg_table {
 		struct list_lru_per_memcg *mlru;
@@ -601,22 +522,45 @@ int list_lru_memcg_alloc(struct list_lru *lru, struct mem_cgroup *memcg, gfp_t g
 		}
 	}
 
-	spin_lock_irqsave(&lru->lock, flags);
-	memcg_lrus = rcu_dereference_protected(lru->memcg_lrus, true);
+	xas_lock_irqsave(&xas, flags);
 	while (i--) {
 		int index = memcg_cache_id(table[i].memcg);
 		struct list_lru_per_memcg *mlru = table[i].mlru;
 
-		if (index < 0 || rcu_dereference_protected(memcg_lrus->lrus[index], true))
+		xas_set(&xas, index);
+retry:
+		if (unlikely(index < 0 || ret || xas_load(&xas))) {
 			kfree(mlru);
-		else
-			rcu_assign_pointer(memcg_lrus->lrus[index], mlru);
+		} else {
+			ret = xa_err(xas_store(&xas, mlru));
This is mixing advanced and normal XArray concepts ... sorry to have
confused you.  I think what you meant to do here was:

			xas_store(&xas, mlru);
			ret = xas_error(&xas);

Or you can avoid introducing 'ret' at all, and keep your errors in the
xa_state.  You're kind of mirroring the xa_state errors into 'ret'
anyway, so that seems easier to understand?
-	memcg_id = memcg_alloc_cache_id();
+	memcg_id = ida_simple_get(&memcg_cache_ida, 0, MEMCG_CACHES_MAX_SIZE,
+				  GFP_KERNEL);
	memcg_id = ida_alloc_max(&memcg_cache_ida,
			MEMCG_CACHES_MAX_SIZE - 1, GFP_KERNEL);

... although i think there's actually a fencepost error, and this really
should be MEMCG_CACHES_MAX_SIZE.
 	objcg = obj_cgroup_alloc();
 	if (!objcg) {
-		memcg_free_cache_id(memcg_id);
+		ida_simple_remove(&memcg_cache_ida, memcg_id);
		ida_free(&memcg_cache_ida, memcg_id);

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help