Re: REGRESSION: 37f4a24c2469: blk-mq: centralise related handling into... | linux-block

(off-list ancestor, not in this archive)
REGRESSION: 37f4a24c2469: blk-mq: centralise related handling into blk_mq_get_driver_tag · "Theodore Y. Ts'o" <tytso@mit.edu> · 2020-09-15
Re: REGRESSION: 37f4a24c2469: blk-mq: centralise related handling into blk_mq_get_driver_tag · Ming Lei <hidden> · 2020-09-15
Re: REGRESSION: 37f4a24c2469: blk-mq: centralise related handling into blk_mq_get_driver_tag · "Theodore Y. Ts'o" <tytso@mit.edu> · 2020-09-15
Re: REGRESSION: 37f4a24c2469: blk-mq: centralise related handling into blk_mq_get_driver_tag · Ming Lei <hidden> · 2020-09-15
Re: REGRESSION: 37f4a24c2469: blk-mq: centralise related handling into blk_mq_get_driver_tag · "Theodore Y. Ts'o" <tytso@mit.edu> · 2020-09-16
Re: REGRESSION: 37f4a24c2469: blk-mq: centralise related handling into blk_mq_get_driver_tag · Ming Lei <hidden> · 2020-09-17
Re: REGRESSION: 37f4a24c2469: blk-mq: centralise related handling into blk_mq_get_driver_tag · "Theodore Y. Ts'o" <tytso@mit.edu> · 2020-09-17
Re: REGRESSION: 37f4a24c2469: blk-mq: centralise related handling into blk_mq_get_driver_tag · Ming Lei <hidden> · 2020-09-17
Re: REGRESSION: 37f4a24c2469: blk-mq: centralise related handling into blk_mq_get_driver_tag · Ming Lei <hidden> · 2020-09-24
Re: REGRESSION: 37f4a24c2469: blk-mq: centralise related handling into blk_mq_get_driver_tag · "Theodore Y. Ts'o" <tytso@mit.edu> · 2020-09-24
Re: REGRESSION: 37f4a24c2469: blk-mq: centralise related handling into blk_mq_get_driver_tag · "Theodore Y. Ts'o" <tytso@mit.edu> · 2020-09-25
Re: REGRESSION: 37f4a24c2469: blk-mq: centralise related handling into blk_mq_get_driver_tag · Ming Lei <hidden> · 2020-09-25
Re: REGRESSION: 37f4a24c2469: blk-mq: centralise related handling into blk_mq_get_driver_tag · Ming Lei <hidden> · 2020-09-25
Re: REGRESSION: 37f4a24c2469: blk-mq: centralise related handling into blk_mq_get_driver_tag · Shakeel Butt <hidden> · 2020-09-25
Re: REGRESSION: 37f4a24c2469: blk-mq: centralise related handling into blk_mq_get_driver_tag · Shakeel Butt <hidden> · 2020-09-25
Re: REGRESSION: 37f4a24c2469: blk-mq: centralise related handling into blk_mq_get_driver_tag · Roman Gushchin <hidden> · 2020-09-25
Re: REGRESSION: 37f4a24c2469: blk-mq: centralise related handling into blk_mq_get_driver_tag · Linus Torvalds <torvalds@linux-foundation.org> · 2020-09-25
Re: REGRESSION: 37f4a24c2469: blk-mq: centralise related handling into blk_mq_get_driver_tag · Shakeel Butt <hidden> · 2020-09-25
Re: REGRESSION: 37f4a24c2469: blk-mq: centralise related handling into blk_mq_get_driver_tag · Shakeel Butt <hidden> · 2020-09-25
Re: REGRESSION: 37f4a24c2469: blk-mq: centralise related handling into blk_mq_get_driver_tag · Roman Gushchin <hidden> · 2020-09-25
Re: REGRESSION: 37f4a24c2469: blk-mq: centralise related handling into blk_mq_get_driver_tag · Shakeel Butt <hidden> · 2020-09-25
Re: REGRESSION: 37f4a24c2469: blk-mq: centralise related handling into blk_mq_get_driver_tag · Shakeel Butt <hidden> · 2020-09-25
Re: REGRESSION: 37f4a24c2469: blk-mq: centralise related handling into blk_mq_get_driver_tag · Roman Gushchin <hidden> · 2020-09-25
Re: REGRESSION: 37f4a24c2469: blk-mq: centralise related handling into blk_mq_get_driver_tag · Shakeel Butt <hidden> · 2020-09-25
Re: REGRESSION: 37f4a24c2469: blk-mq: centralise related handling into blk_mq_get_driver_tag · "Theodore Y. Ts'o" <tytso@mit.edu> · 2020-09-27
Re: REGRESSION: 37f4a24c2469: blk-mq: centralise related handling into blk_mq_get_driver_tag · Ming Lei <hidden> · 2020-09-26
Re: REGRESSION: 37f4a24c2469: blk-mq: centralise related handling into blk_mq_get_driver_tag · Roman Gushchin <hidden> · 2020-09-26
Re: REGRESSION: 37f4a24c2469: blk-mq: centralise related handling into blk_mq_get_driver_tag · Ming Lei <hidden> · 2020-09-25
Re: REGRESSION: 37f4a24c2469: blk-mq: centralise related handling into blk_mq_get_driver_tag · Ming Lei <hidden> · 2020-09-25

Re: REGRESSION: 37f4a24c2469: blk-mq: centralise related handling into blk_mq_get_driver_tag

From: Roman Gushchin <hidden>
Date: 2020-09-26 06:43:01
Also in: linux-ext4, linux-mm, lkml

On Sat, Sep 26, 2020 at 09:43:25AM +0800, Ming Lei wrote:

On Fri, Sep 25, 2020 at 12:19:02PM -0700, Shakeel Butt wrote:

quoted

On Fri, Sep 25, 2020 at 10:58 AM Shakeel Butt [off-list ref]
wrote:

quoted

[snip]

quoted

I don't think you can ignore the flushing. The __free_once() in
___cache_free() assumes there is a space available.

BTW do_drain() also have the same issue.

Why not move slabs_destroy() after we update ac->avail and memmove()?

Ming, can you please try the following patch?


From: Shakeel Butt <redacted>

[PATCH] mm: slab: fix potential infinite recursion in ___cache_free

With the commit 10befea91b61 ("mm: memcg/slab: use a single set of
kmem_caches for all allocations"), it becomes possible to call kfree()
from the slabs_destroy(). However if slabs_destroy() is being called for
the array_cache of the local CPU then this opens the potential scenario
of infinite recursion because kfree() called from slabs_destroy() can
call slabs_destroy() with the same array_cache of the local CPU. Since
the array_cache of the local CPU is not updated before calling
slabs_destroy(), it will try to free the same pages.

To fix the issue, simply update the cache before calling
slabs_destroy().

Signed-off-by: Shakeel Butt <redacted>
---
 mm/slab.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index 3160dff6fd76..f658e86ec8ce 100644
--- a/mm/slab.c
+++ b/mm/slab.c

@@ -1632,6 +1632,10 @@ static void slab_destroy(struct kmem_cache *cachep, struct page *page)
 		kmem_cache_free(cachep->freelist_cache, freelist);
 }
 
+/*
+ * Update the size of the caches before calling slabs_destroy as it may
+ * recursively call kfree.
+ */
 static void slabs_destroy(struct kmem_cache *cachep, struct list_head *list)
 {
 	struct page *page, *n;

@@ -2153,8 +2157,8 @@ static void do_drain(void *arg)
 	spin_lock(&n->list_lock);
 	free_block(cachep, ac->entry, ac->avail, node, &list);
 	spin_unlock(&n->list_lock);
-	slabs_destroy(cachep, &list);
 	ac->avail = 0;
+	slabs_destroy(cachep, &list);
 }
 
 static void drain_cpu_caches(struct kmem_cache *cachep)

@@ -3402,9 +3406,9 @@ static void cache_flusharray(struct kmem_cache *cachep, struct array_cache *ac)
 	}
 #endif
 	spin_unlock(&n->list_lock);
-	slabs_destroy(cachep, &list);
 	ac->avail -= batchcount;
 	memmove(ac->entry, &(ac->entry[batchcount]), sizeof(void *)*ac->avail);
+	slabs_destroy(cachep, &list);
 }

The issue can't be reproduced after applying this patch:

Tested-by: Ming Lei <redacted>

Perfect, thank you very much for the confirmation!

Shakeel, can you, please, resend the patch with the proper fixes tag
and the updated commit log? Please, feel free to add
Reviewed-by: Roman Gushchin <redacted> .

Thank you!

Roman

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help