Re: [PATCH v3 12/13] execute the whole memcg freeing in rcu callback

[PATCH v3 00/13] kmem controller for memcg. · Glauber Costa <hidden> · 2012-09-18
[PATCH v3 02/13] memcg: Reclaim when more than one page needed. · Glauber Costa <hidden> · 2012-09-18
Re: [PATCH v3 02/13] memcg: Reclaim when more than one page needed. · Johannes Weiner <hannes@cmpxchg.org> · 2012-10-01
[PATCH v3 04/13] kmem accounting basic infrastructure · Glauber Costa <hidden> · 2012-09-18
Re: [PATCH v3 04/13] kmem accounting basic infrastructure · Tejun Heo <tj@kernel.org> · 2012-09-21
Re: [PATCH v3 04/13] kmem accounting basic infrastructure · Glauber Costa <hidden> · 2012-09-24
Re: [PATCH v3 04/13] kmem accounting basic infrastructure · Michal Hocko <hidden> · 2012-09-26
Re: [PATCH v3 04/13] kmem accounting basic infrastructure · Glauber Costa <hidden> · 2012-09-26
Re: [PATCH v3 04/13] kmem accounting basic infrastructure · Michal Hocko <hidden> · 2012-09-26
Re: [PATCH v3 04/13] kmem accounting basic infrastructure · Glauber Costa <hidden> · 2012-09-26
Re: [PATCH v3 04/13] kmem accounting basic infrastructure · Tejun Heo <tj@kernel.org> · 2012-09-26
Re: [PATCH v3 04/13] kmem accounting basic infrastructure · Glauber Costa <hidden> · 2012-09-26
Re: [PATCH v3 04/13] kmem accounting basic infrastructure · Tejun Heo <tj@kernel.org> · 2012-09-26
Re: [PATCH v3 04/13] kmem accounting basic infrastructure · Glauber Costa <hidden> · 2012-09-26
Re: [PATCH v3 04/13] kmem accounting basic infrastructure · Tejun Heo <tj@kernel.org> · 2012-09-26
Re: [PATCH v3 04/13] kmem accounting basic infrastructure · Glauber Costa <hidden> · 2012-09-26
Re: [PATCH v3 04/13] kmem accounting basic infrastructure · Tejun Heo <tj@kernel.org> · 2012-09-26
Re: [PATCH v3 04/13] kmem accounting basic infrastructure · Glauber Costa <hidden> · 2012-09-26
Re: [PATCH v3 04/13] kmem accounting basic infrastructure · Tejun Heo <tj@kernel.org> · 2012-09-26
Re: [PATCH v3 04/13] kmem accounting basic infrastructure · Glauber Costa <hidden> · 2012-09-26
Re: [PATCH v3 04/13] kmem accounting basic infrastructure · Tejun Heo <tj@kernel.org> · 2012-09-26
Re: [PATCH v3 04/13] kmem accounting basic infrastructure · Glauber Costa <hidden> · 2012-09-26
Re: [PATCH v3 04/13] kmem accounting basic infrastructure · Tejun Heo <tj@kernel.org> · 2012-09-26
Re: [PATCH v3 04/13] kmem accounting basic infrastructure · Glauber Costa <hidden> · 2012-09-26
Re: [PATCH v3 04/13] kmem accounting basic infrastructure · Tejun Heo <tj@kernel.org> · 2012-09-26
Re: [PATCH v3 04/13] kmem accounting basic infrastructure · Glauber Costa <hidden> · 2012-09-26
Re: [PATCH v3 04/13] kmem accounting basic infrastructure · Tejun Heo <tj@kernel.org> · 2012-09-26
Re: [PATCH v3 04/13] kmem accounting basic infrastructure · Glauber Costa <hidden> · 2012-09-26
Re: [PATCH v3 04/13] kmem accounting basic infrastructure · Tejun Heo <tj@kernel.org> · 2012-09-26
Re: [PATCH v3 04/13] kmem accounting basic infrastructure · Michal Hocko <hidden> · 2012-09-27
Re: [PATCH v3 04/13] kmem accounting basic infrastructure · Glauber Costa <hidden> · 2012-09-27
Re: [PATCH v3 04/13] kmem accounting basic infrastructure · Michal Hocko <hidden> · 2012-09-27
Re: [PATCH v3 04/13] kmem accounting basic infrastructure · Glauber Costa <hidden> · 2012-09-27
Re: [PATCH v3 04/13] kmem accounting basic infrastructure · Michal Hocko <hidden> · 2012-09-27
Re: [PATCH v3 04/13] kmem accounting basic infrastructure · Mel Gorman <mgorman@suse.de> · 2012-09-27
Re: [PATCH v3 04/13] kmem accounting basic infrastructure · Tejun Heo <tj@kernel.org> · 2012-09-27
Re: [PATCH v3 04/13] kmem accounting basic infrastructure · Glauber Costa <hidden> · 2012-09-27
Re: [PATCH v3 04/13] kmem accounting basic infrastructure · Tejun Heo <tj@kernel.org> · 2012-09-27
Re: [PATCH v3 04/13] kmem accounting basic infrastructure · Michal Hocko <hidden> · 2012-09-27
Re: [PATCH v3 04/13] kmem accounting basic infrastructure · Glauber Costa <hidden> · 2012-09-27
Re: [PATCH v3 04/13] kmem accounting basic infrastructure · Tejun Heo <tj@kernel.org> · 2012-09-30
Re: [PATCH v3 04/13] kmem accounting basic infrastructure · Tejun Heo <tj@kernel.org> · 2012-09-30
Re: [PATCH v3 04/13] kmem accounting basic infrastructure · James Bottomley <James.Bottomley@HansenPartnership.com> · 2012-09-30
Re: [PATCH v3 04/13] kmem accounting basic infrastructure · Tejun Heo <tj@kernel.org> · 2012-09-30
Re: [PATCH v3 04/13] kmem accounting basic infrastructure · James Bottomley <James.Bottomley@HansenPartnership.com> · 2012-09-30
Re: [PATCH v3 04/13] kmem accounting basic infrastructure · Tejun Heo <tj@kernel.org> · 2012-10-01
Re: [PATCH v3 04/13] kmem accounting basic infrastructure · Glauber Costa <hidden> · 2012-10-01
Re: [PATCH v3 04/13] kmem accounting basic infrastructure · Glauber Costa <hidden> · 2012-10-01
Re: [PATCH v3 04/13] kmem accounting basic infrastructure · Tejun Heo <tj@kernel.org> · 2012-10-03
Re: [PATCH v3 04/13] kmem accounting basic infrastructure · Glauber Costa <hidden> · 2012-10-01
Re: [PATCH v3 04/13] kmem accounting basic infrastructure · Michal Hocko <hidden> · 2012-09-27
Re: [PATCH v3 04/13] kmem accounting basic infrastructure · Glauber Costa <hidden> · 2012-09-27
Re: [PATCH v3 04/13] kmem accounting basic infrastructure · Tejun Heo <tj@kernel.org> · 2012-09-27
Re: [PATCH v3 04/13] kmem accounting basic infrastructure · Mel Gorman <mgorman@suse.de> · 2012-09-27
Re: [PATCH v3 04/13] kmem accounting basic infrastructure · Tejun Heo <tj@kernel.org> · 2012-09-27
Re: [PATCH v3 04/13] kmem accounting basic infrastructure · Glauber Costa <hidden> · 2012-09-27
Re: [PATCH v3 04/13] kmem accounting basic infrastructure · Tejun Heo <tj@kernel.org> · 2012-09-30
Re: [PATCH v3 04/13] kmem accounting basic infrastructure · Glauber Costa <hidden> · 2012-10-01
Re: [PATCH v3 04/13] kmem accounting basic infrastructure · Tejun Heo <tj@kernel.org> · 2012-10-03
Re: [PATCH v3 04/13] kmem accounting basic infrastructure · Glauber Costa <hidden> · 2012-10-04
Re: [PATCH v3 04/13] kmem accounting basic infrastructure · Tejun Heo <tj@kernel.org> · 2012-10-06
Re: [PATCH v3 04/13] kmem accounting basic infrastructure · Michal Hocko <hidden> · 2012-09-27
Re: [PATCH v3 04/13] kmem accounting basic infrastructure · Tejun Heo <tj@kernel.org> · 2012-09-30
Re: [PATCH v3 04/13] kmem accounting basic infrastructure · Michal Hocko <hidden> · 2012-10-01
Re: [PATCH v3 04/13] kmem accounting basic infrastructure · Tejun Heo <tj@kernel.org> · 2012-10-03
Re: [PATCH v3 04/13] kmem accounting basic infrastructure · Michal Hocko <hidden> · 2012-10-05
Re: [PATCH v3 04/13] kmem accounting basic infrastructure · Johannes Weiner <hannes@cmpxchg.org> · 2012-09-26
Re: [PATCH v3 04/13] kmem accounting basic infrastructure · Glauber Costa <hidden> · 2012-09-26
[PATCH v3 03/13] memcg: change defines to an enum · Glauber Costa <hidden> · 2012-09-18
Re: [PATCH v3 03/13] memcg: change defines to an enum · Johannes Weiner <hannes@cmpxchg.org> · 2012-10-01
Re: [PATCH v3 03/13] memcg: change defines to an enum · Glauber Costa <hidden> · 2012-10-02
[PATCH v3 08/13] res_counter: return amount of charges after res_counter_uncharge · Glauber Costa <hidden> · 2012-09-18
Re: [PATCH v3 08/13] res_counter: return amount of charges after res_counter_uncharge · Michal Hocko <hidden> · 2012-10-01
Re: [PATCH v3 08/13] res_counter: return amount of charges after res_counter_uncharge · Glauber Costa <hidden> · 2012-10-01
[PATCH v3 12/13] execute the whole memcg freeing in rcu callback · Glauber Costa <hidden> · 2012-09-18
Re: [PATCH v3 12/13] execute the whole memcg freeing in rcu callback · Tejun Heo <tj@kernel.org> · 2012-09-21
Re: [PATCH v3 12/13] execute the whole memcg freeing in rcu callback · Glauber Costa <hidden> · 2012-09-24
Re: [PATCH v3 12/13] execute the whole memcg freeing in rcu callback · Michal Hocko <hidden> · 2012-10-01
Re: [PATCH v3 12/13] execute the whole memcg freeing in rcu callback · Glauber Costa <hidden> · 2012-10-04
Re: [PATCH v3 12/13] execute the whole memcg freeing in rcu callback · Glauber Costa <hidden> · 2012-10-04
Re: [PATCH v3 12/13] execute the whole memcg freeing in rcu callback · Johannes Weiner <hannes@cmpxchg.org> · 2012-10-05
Re: [PATCH v3 12/13] execute the whole memcg freeing in rcu callback · Glauber Costa <hidden> · 2012-10-08
[PATCH v3 11/13] memcg: allow a memcg with kmem charges to be destructed. · Glauber Costa <hidden> · 2012-09-18
Re: [PATCH v3 11/13] memcg: allow a memcg with kmem charges to be destructed. · Michal Hocko <hidden> · 2012-10-01
[PATCH v3 06/13] memcg: kmem controller infrastructure · Glauber Costa <hidden> · 2012-09-18
Re: [PATCH v3 06/13] memcg: kmem controller infrastructure · JoonSoo Kim <hidden> · 2012-09-20
Re: [PATCH v3 06/13] memcg: kmem controller infrastructure · Glauber Costa <hidden> · 2012-09-21
Re: [PATCH v3 06/13] memcg: kmem controller infrastructure · JoonSoo Kim <hidden> · 2012-09-21
Re: [PATCH v3 06/13] memcg: kmem controller infrastructure · Michal Hocko <hidden> · 2012-09-26
Re: [PATCH v3 06/13] memcg: kmem controller infrastructure · Glauber Costa <hidden> · 2012-09-27
Re: [PATCH v3 06/13] memcg: kmem controller infrastructure · Michal Hocko <hidden> · 2012-09-27
Re: [PATCH v3 06/13] memcg: kmem controller infrastructure · Glauber Costa <hidden> · 2012-09-28
Re: [PATCH v3 06/13] memcg: kmem controller infrastructure · Tejun Heo <tj@kernel.org> · 2012-09-30
Re: [PATCH v3 06/13] memcg: kmem controller infrastructure · Glauber Costa <hidden> · 2012-10-01
Re: [PATCH v3 06/13] memcg: kmem controller infrastructure · Tejun Heo <tj@kernel.org> · 2012-10-03
Re: [PATCH v3 06/13] memcg: kmem controller infrastructure · Michal Hocko <hidden> · 2012-10-01
Re: [PATCH v3 06/13] memcg: kmem controller infrastructure · Michal Hocko <hidden> · 2012-10-01
Re: [PATCH v3 06/13] memcg: kmem controller infrastructure · Glauber Costa <hidden> · 2012-10-01
Re: [PATCH v3 06/13] memcg: kmem controller infrastructure · Michal Hocko <hidden> · 2012-10-01
Re: [PATCH v3 06/13] memcg: kmem controller infrastructure · Glauber Costa <hidden> · 2012-10-01
Re: [PATCH v3 06/13] memcg: kmem controller infrastructure · Michal Hocko <hidden> · 2012-10-01
Re: [PATCH v3 06/13] memcg: kmem controller infrastructure · Glauber Costa <hidden> · 2012-10-01
[PATCH v3 09/13] memcg: kmem accounting lifecycle management · Glauber Costa <hidden> · 2012-09-18
Re: [PATCH v3 09/13] memcg: kmem accounting lifecycle management · Michal Hocko <hidden> · 2012-10-01
Re: [PATCH v3 09/13] memcg: kmem accounting lifecycle management · Glauber Costa <hidden> · 2012-10-01
Re: [PATCH v3 09/13] memcg: kmem accounting lifecycle management · Michal Hocko <hidden> · 2012-10-01
Re: [PATCH v3 09/13] memcg: kmem accounting lifecycle management · Glauber Costa <hidden> · 2012-10-01
[PATCH v3 05/13] Add a __GFP_KMEMCG flag · Glauber Costa <hidden> · 2012-09-18
Re: [PATCH v3 05/13] Add a __GFP_KMEMCG flag · Rik van Riel <hidden> · 2012-09-18
Re: [PATCH v3 05/13] Add a __GFP_KMEMCG flag · Christoph Lameter <hidden> · 2012-09-18
Re: [PATCH v3 05/13] Add a __GFP_KMEMCG flag · Glauber Costa <hidden> · 2012-09-19
Re: [PATCH v3 05/13] Add a __GFP_KMEMCG flag · Christoph Lameter <hidden> · 2012-09-19
Re: [PATCH v3 05/13] Add a __GFP_KMEMCG flag · Mel Gorman <mgorman@suse.de> · 2012-09-27
Re: [PATCH v3 05/13] Add a __GFP_KMEMCG flag · Glauber Costa <hidden> · 2012-09-27
Re: [PATCH v3 05/13] Add a __GFP_KMEMCG flag · Johannes Weiner <hannes@cmpxchg.org> · 2012-10-01
[PATCH v3 13/13] protect architectures where THREAD_SIZE >= PAGE_SIZE against fork bombs · Glauber Costa <hidden> · 2012-09-18
Re: [PATCH v3 13/13] protect architectures where THREAD_SIZE >= PAGE_SIZE against fork bombs · Michal Hocko <hidden> · 2012-10-01
[PATCH v3 10/13] memcg: use static branches when code not in use · Glauber Costa <hidden> · 2012-09-18
Re: [PATCH v3 10/13] memcg: use static branches when code not in use · Michal Hocko <hidden> · 2012-10-01
Re: [PATCH v3 10/13] memcg: use static branches when code not in use · Glauber Costa <hidden> · 2012-10-01
[PATCH v3 07/13] mm: Allocate kernel pages to the right memcg · Glauber Costa <hidden> · 2012-09-18
Re: [PATCH v3 07/13] mm: Allocate kernel pages to the right memcg · Mel Gorman <mgorman@suse.de> · 2012-09-27
Re: [PATCH v3 07/13] mm: Allocate kernel pages to the right memcg · Glauber Costa <hidden> · 2012-09-28
Re: [PATCH v3 07/13] mm: Allocate kernel pages to the right memcg · Mel Gorman <mgorman@suse.de> · 2012-09-28
Re: [PATCH v3 07/13] mm: Allocate kernel pages to the right memcg · Michal Hocko <hidden> · 2012-09-27
[PATCH v3 01/13] memcg: Make it possible to use the stock for more than one page. · Glauber Costa <hidden> · 2012-09-18
Re: [PATCH v3 01/13] memcg: Make it possible to use the stock for more than one page. · Johannes Weiner <hannes@cmpxchg.org> · 2012-10-01

From: Glauber Costa <hidden>
Date: 2012-10-04 14:20:23
Also in: linux-mm, lkml

On 10/04/2012 02:53 PM, Glauber Costa wrote:

On 10/01/2012 05:27 PM, Michal Hocko wrote:

quoted

On Tue 18-09-12 18:04:09, Glauber Costa wrote:

quoted

A lot of the initialization we do in mem_cgroup_create() is done with softirqs
enabled. This include grabbing a css id, which holds &ss->id_lock->rlock, and
the per-zone trees, which holds rtpz->lock->rlock. All of those signal to the
lockdep mechanism that those locks can be used in SOFTIRQ-ON-W context. This
means that the freeing of memcg structure must happen in a compatible context,
otherwise we'll get a deadlock.

Maybe I am missing something obvious but why cannot we simply disble
(soft)irqs in mem_cgroup_create rather than make the free path much more
complicated. It really feels strange to defer everything (e.g. soft
reclaim tree cleanup which should be a no-op at the time because there
shouldn't be any user pages in the group).

Ok.

I was just able to come back to this today - I was mostly working on the
slab feedback over the past few days. I will answer yours and Tejun's
concerns at once:

Here is the situation: the backtrace I get is this one:

[  124.956725] =================================
[  124.957217] [ INFO: inconsistent lock state ]
[  124.957217] 3.5.0+ #99 Not tainted
[  124.957217] ---------------------------------
[  124.957217] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
[  124.957217] ksoftirqd/0/3 [HC0[0]:SC1[1]:HE1:SE0] takes:
[  124.957217]  (&(&ss->id_lock)->rlock){+.?...}, at:
[<ffffffff810aa7b2>] spin_lock+0x9/0xb
[  124.957217] {SOFTIRQ-ON-W} state was registered at:
[  124.957217]   [<ffffffff810996ed>] __lock_acquire+0x31f/0xd68
[  124.957217]   [<ffffffff8109a660>] lock_acquire+0x108/0x15c
[  124.957217]   [<ffffffff81534ec4>] _raw_spin_lock+0x40/0x4f
[  124.957217]   [<ffffffff810aa7b2>] spin_lock+0x9/0xb
[  124.957217]   [<ffffffff810ad00e>] get_new_cssid+0x69/0xf3
[  124.957217]   [<ffffffff810ad0da>] cgroup_init_idr+0x42/0x60
[  124.957217]   [<ffffffff81b20e04>] cgroup_init+0x50/0x100
[  124.957217]   [<ffffffff81b05b9b>] start_kernel+0x3b9/0x3ee
[  124.957217]   [<ffffffff81b052d6>] x86_64_start_reservations+0xb1/0xb5
[  124.957217]   [<ffffffff81b053d8>] x86_64_start_kernel+0xfe/0x10b


So what we learn from it, is: we are acquiring a specific lock (the css
id one) from softirq context. It was previously taken in a
softirq-enabled context, that seems to be coming directly from
get_new_cssid.

Tejun correctly pointed out that we should never acquire that lock from
a softirq context, in which he is right.

But the situation changes slightly with kmem. Now, the following excerpt
of a backtrace is possible:

[   48.602775]  [<ffffffff81103095>] free_accounted_pages+0x47/0x4c
[   48.602775]  [<ffffffff81047f90>] free_task+0x31/0x5c
[   48.602775]  [<ffffffff8104807d>] __put_task_struct+0xc2/0xdb
[   48.602775]  [<ffffffff8104dfc7>] put_task_struct+0x1e/0x22
[   48.602775]  [<ffffffff8104e144>] delayed_put_task_struct+0x7a/0x98
[   48.602775]  [<ffffffff810cf0e5>] __rcu_process_callbacks+0x269/0x3df
[   48.602775]  [<ffffffff810cf28c>] rcu_process_callbacks+0x31/0x5b
[   48.602775]  [<ffffffff8105266d>] __do_softirq+0x122/0x277

So as you can see, free_accounted_pages (that will trigger a memcg_put()
-> mem_cgroup_free()) can now be called from softirq context, which is,
an rcu callback (and I just realized I wrote the exact opposite in the
subj line: man, I really suck at that!!)
As a matter of fact, we could not move to our rcu callback as well:

we need to move it to a worker thread with the rest.

We already have a worker thread: he reason we have it is not
static_branches: The reason is vfree(), that will BUG_ON(in_interrupt())
and could not be called from rcu callback as well. We moved static
branches in there as well for a similar problem, but haven't introduced it.

Could we move just part of it to the worker thread? Absolutely yes.
Moving just free_css_id() is enough to make it work. But since it is not
the first context related problem we had, I thought: "to hell with that,
let's move everything and be safe".

I am fine moving free_css_id() only if you would prefer.

Can we disable softirqs when we initialize css_id? Maybe. My machine
seems to boot fine and survive the simple workload that would trigger
that bug if I use irqsave spinlocks instead of normal spinlocks. But
this has to be done from cgroup core: We have no control over css
creation in memcg.

How would you guys like me to handle this ?

One more thing: As I mentioned in the Changelog,
mem_cgroup_remove_exceeded(), called from mem_cgroup_remove_from_trees()
will lead to the same usage pattern.




--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help