Re: [PATCH 4/4] memcg: get rid of percpu_charge_mutex lock
From: Johannes Weiner <hidden>
Date: 2011-08-08 18:47:56
Also in:
lkml
On Fri, Jul 22, 2011 at 01:20:25PM +0200, Michal Hocko wrote:
quoted hunk ↗ jump to hunk
percpu_charge_mutex protects from multiple simultaneous per-cpu charge caches draining because we might end up having too many work items. At least this was the case until 26fe6168 (memcg: fix percpu cached charge draining frequency) when we introduced a more targeted draining for async mode. Now that also sync draining is targeted we can safely remove mutex because we will not send more work than the current number of CPUs. FLUSHING_CACHED_CHARGE protects from sending the same work multiple times and stock->nr_pages == 0 protects from pointless sending a work if there is obviously nothing to be done. This is of course racy but we can live with it as the race window is really small (we would have to see FLUSHING_CACHED_CHARGE cleared while nr_pages would be still non-zero). The only remaining place where we can race is synchronous mode when we rely on FLUSHING_CACHED_CHARGE test which might have been set by other drainer on the same group but we should wait in that case as well. Signed-off-by: Michal Hocko <redacted> --- mm/memcontrol.c | 12 ++---------- 1 files changed, 2 insertions(+), 10 deletions(-)diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 3685107..f8463a0 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c@@ -2071,7 +2071,6 @@ struct memcg_stock_pcp { #define FLUSHING_CACHED_CHARGE (0) }; static DEFINE_PER_CPU(struct memcg_stock_pcp, memcg_stock); -static DEFINE_MUTEX(percpu_charge_mutex); /* * Try to consume stocked charge on this cpu. If success, one page is consumed@@ -2178,7 +2177,8 @@ static void drain_all_stock(struct mem_cgroup *root_mem, bool sync) for_each_online_cpu(cpu) { struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu); - if (test_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) + if (mem_cgroup_same_or_subtree(root_mem, stock->cached) && + test_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) flush_work(&stock->work); } out:
This hunk triggers a crash for me, as the draining is already done and
stock->cached reset to NULL when dereferenced here. Oops is attached.
We have this loop in drain_all_stock():
for_each_online_cpu(cpu) {
struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu);
struct mem_cgroup *mem;
mem = stock->cached;
if (!mem || !stock->nr_pages)
continue;
if (!mem_cgroup_same_or_subtree(root_mem, mem))
continue;
if (!test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) {
if (cpu == curcpu)
drain_local_stock(&stock->work);
else
schedule_work_on(cpu, &stock->work);
}
}
The only thing that stabilizes stock->cached is the knowledge that
there are still pages accounted to the memcg.
Without the mutex serializing this code, can't there be a concurrent
execution that leads to stock->cached being drained, becoming empty
and freed by someone else between the stock->nr_pages check and the
ancestor check, resulting in use after free?
What makes stock->cached safe to dereference?
[ 2313.442944] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
[ 2313.443935] IP: [<ffffffff81083b70>] css_is_ancestor+0x20/0x70
[ 2313.443935] PGD 4ae7a067 PUD 4adc4067 PMD 0
[ 2313.443935] Oops: 0000 [#1] PREEMPT SMP
[ 2313.443935] CPU 0
[ 2313.443935] Pid: 19677, comm: rmdir Tainted: G W 3.0.0-mm1-00188-gf38d32b #35 ECS MCP61M-M3/MCP61M-M3
[ 2313.443935] RIP: 0010:[<ffffffff81083b70>] [<ffffffff81083b70>] css_is_ancestor+0x20/0x70
[ 2313.443935] RSP: 0018:ffff880077b09c88 EFLAGS: 00010202
[ 2313.443935] RAX: ffff8800781bb310 RBX: 0000000000000000 RCX: 000000000000003e
[ 2313.443935] RDX: 0000000000000000 RSI: ffff8800779f7c00 RDI: 0000000000000000
[ 2313.443935] RBP: ffff880077b09c98 R08: ffffffff818a4e88 R09: 0000000000000000
[ 2313.443935] R10: 0000000000000000 R11: dead000000100100 R12: ffff8800779f7c00
[ 2313.443935] R13: ffff8800779f7c00 R14: 0000000000000000 R15: ffff88007bc0eb80
[ 2313.443935] FS: 00007f5d689ec720(0000) GS:ffff88007bc00000(0000) knlGS:0000000000000000
[ 2313.443935] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 2313.443935] CR2: 0000000000000018 CR3: 000000004ad57000 CR4: 00000000000006f0
[ 2313.443935] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2313.443935] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 2313.443935] Process rmdir (pid: 19677, threadinfo ffff880077b08000, task ffff8800781bb310)
[ 2313.443935] Stack:
[ 2313.443935] ffffffff818a4e88 000000000000eb80 ffff880077b09ca8 ffffffff810feba3
[ 2313.443935] ffff880077b09d08 ffffffff810feccf ffff880077b09cf8 0000000000000001
[ 2313.443935] ffff88007bd0eb80 0000000000000001 ffff880077af2000 0000000000000000
[ 2313.443935] Call Trace:
[ 2313.443935] [<ffffffff810feba3>] mem_cgroup_same_or_subtree+0x33/0x40
[ 2313.443935] [<ffffffff810feccf>] drain_all_stock+0x11f/0x170
[ 2313.443935] [<ffffffff81103211>] mem_cgroup_force_empty+0x231/0x6d0
[ 2313.443935] [<ffffffff81111872>] ? path_put+0x22/0x30
[ 2313.443935] [<ffffffff8111c925>] ? __d_lookup+0xb5/0x170
[ 2313.443935] [<ffffffff811036c4>] mem_cgroup_pre_destroy+0x14/0x20
[ 2313.443935] [<ffffffff81080559>] cgroup_rmdir+0xb9/0x500
[ 2313.443935] [<ffffffff81063990>] ? abort_exclusive_wait+0xb0/0xb0
[ 2313.443935] [<ffffffff81114d26>] vfs_rmdir+0x86/0xe0
[ 2313.443935] [<ffffffff811233d3>] ? mnt_want_write+0x43/0x80
[ 2313.443935] [<ffffffff81114e7b>] do_rmdir+0xfb/0x110
[ 2313.443935] [<ffffffff81114ea6>] sys_rmdir+0x16/0x20
[ 2313.443935] [<ffffffff8154d76b>] system_call_fastpath+0x16/0x1b
[ 2313.443935] Code: b7 42 0a 5d c3 66 0f 1f 44 00 00 55 48 89 e5 48 83 ec 10 48 89 5d f0 4c 89 65 f8 66 66 66 66 90 48 89 fb 49 89 f4 e8 10 85 00 00
[ 2313.443935] 8b 43 18 49 8b 54 24 18 48 85 d2 74 05 48 85 c0 75 15 31 db
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>