Thread (29 messages) 29 messages, 5 authors, 2020-09-27

Re: REGRESSION: 37f4a24c2469: blk-mq: centralise related handling into blk_mq_get_driver_tag

From: Shakeel Butt <hidden>
Date: 2020-09-25 17:35:18
Also in: linux-ext4, linux-mm, lkml

On Fri, Sep 25, 2020 at 10:22 AM Shakeel Butt [off-list ref] wrote:
On Fri, Sep 25, 2020 at 10:17 AM Linus Torvalds
[off-list ref] wrote:
quoted
On Fri, Sep 25, 2020 at 9:19 AM Ming Lei [off-list ref] wrote:
quoted
git bisect shows the first bad commit:

        [10befea91b61c4e2c2d1df06a2e978d182fcf792] mm: memcg/slab: use a single set of
                kmem_caches for all allocations

And I have double checked that the above commit is really the first bad
commit for the list corruption issue of 'list_del corruption, ffffe1c241b00408->next
is LIST_POISON1 (dead000000000100)',
Thet commit doesn't revert cleanly, but I think that's purely because
we'd also need to revert

  849504809f86 ("mm: memcg/slab: remove unused argument by charge_slab_page()")
  74d555bed5d0 ("mm: slab: rename (un)charge_slab_page() to
(un)account_slab_page()")

too.

Can you verify that a

    git revert 74d555bed5d0 849504809f86 10befea91b61

on top of current -git makes things work for you again?

I'm going to do an rc8 this release simply because we have another VM
issue that I hope to get fixed - but there we know what the problem
and the fix _is_, it just needs some care.

So if Roman (or somebody else) can see what's wrong and we can fix
this quickly, we don't need to go down the revert path, but ..
I think I have a theory. The issue is happening due to the potential
infinite recursion:

[ 5060.124412]  ___cache_free+0x488/0x6b0
*****Second recursion
[ 5060.128666]  kfree+0xc9/0x1d0
[ 5060.131947]  kmem_freepages+0xa0/0xf0
[ 5060.135746]  slab_destroy+0x19/0x50
[ 5060.139577]  slabs_destroy+0x6d/0x90
[ 5060.143379]  ___cache_free+0x4a3/0x6b0
*****First recursion
[ 5060.147896]  kfree+0xc9/0x1d0
[ 5060.151082]  kmem_freepages+0xa0/0xf0
[ 5060.155121]  slab_destroy+0x19/0x50
[ 5060.159028]  slabs_destroy+0x6d/0x90
[ 5060.162920]  ___cache_free+0x4a3/0x6b0
[ 5060.167097]  kfree+0xc9/0x1d0

___cache_free() is calling cache_flusharray() to flush the local cpu
array_cache if the cache has more elements than the limit (ac->avail
quoted
= ac->limit).
cache_flusharray() is removing batchcount number of element from local
cpu array_cache and pass it slabs_destroy (if the node shared cache is
also full).

Note that we have not updated local cpu array_cache size yet and
called slabs_destroy() which can call kfree() through
unaccount_slab_page().

We are on the same CPU and this recursive kfree again check the
(ac->avail >= ac->limit) and call cache_flusharray() again and recurse
indefinitely.
I can see two possible fixes. We can either do async kfree of
page_obj_cgroups(page) or we can update the local cpu array_cache's
size before slabs_destroy().

Shakeel
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help