Re: memory-cgroup bug
From: Michal Hocko <hidden>
Date: 2012-11-22 21:42:59
Also in:
cgroups, lkml
Possibly related (same subject, not in this thread)
- 2012-11-23 · Re: memory-cgroup bug · azurIt <hidden>
- 2012-11-22 · Re: memory-cgroup bug · Michal Hocko <hidden>
- 2012-11-22 · Re: memory-cgroup bug · Kamezawa Hiroyuki <hidden>
On Thu 22-11-12 19:05:26, azurIt wrote: [...]
My cgroups hierarchy: /cgroups/<user_id>/uid/ where '<user_id>' is system user id and 'uid' is just word 'uid'. Memory limits are set in /cgroups/<user_id>/ and hierarchy is enabled. Processes are inside /cgroups/<user_id>/uid/ . I'm using hard limits for memory and swap BUT system has no swap at all (it has 'only' 16 GB of real RAM). memory.oom_control is set to 'oom_kill_disable 0'. Server has enough of free memory when problem occurs.
OK, so so the global reclaim shouldn't be active. This is definitely good to know.
quoted
quoted
This happens when problem occures: - no new processes can be started for this cgroup - current processes are freezed and taking 100% of CPU - when i try to 'strace' any of current processes, the whole strace freezes until process is killed (strace cannot be terminated by CTRL-c) - problem can be resolved by raising memory limit for cgroup or killing of few processes inside cgroup so some memory is freed I also garbbed the content of /proc/<pid>/stack of freezed process: [<ffffffff8110a9c1>] mem_cgroup_handle_oom+0x241/0x3b0 [<ffffffff8110b5ab>] T.1146+0x5ab/0x5c0Hmm what is this?Really doesn't know, i will get stack of all freezed processes next time so we can compare it.quoted
quoted
[<ffffffff8110ba56>] mem_cgroup_charge_common+0x56/0xa0 [<ffffffff8110bae5>] mem_cgroup_newpage_charge+0x45/0x50 [<ffffffff810ec54e>] do_wp_page+0x14e/0x800 [<ffffffff810eda34>] handle_pte_fault+0x264/0x940 [<ffffffff810ee248>] handle_mm_fault+0x138/0x260 [<ffffffff810270ed>] do_page_fault+0x13d/0x460 [<ffffffff815b53ff>] page_fault+0x1f/0x30 [<ffffffffffffffff>] 0xffffffffffffffff
Btw. is this stack stable or is the task bouncing in some loop? And finally could you post the disassembly of your version of mem_cgroup_handle_oom, please?
quoted
How many tasks are hung in mem_cgroup_handle_oom? If there were many of them then it'd smell like an issue fixed by 79dfdaccd1d5 (memcg: make oom_lock 0 and 1 based rather than counter) and its follow up fix 23751be00940 (memcg: fix hierarchical oom locking) but you are saying that you can reproduce with 3.2 and those went in for 3.1. 2.6.32 would make more sense.Usually maximum of several 10s of processes but i will check it next time. I was having much worse problems in 2.6.32 - when freezing happens, the whole server was affected (i wasn't able to do anything and needs to wait until my scripts takes case of it and killed apache, so i don't have any detailed info).
Hmm, maybe the issue fixed by 1d65f86d (mm: preallocate page before lock_page() at filemap COW) which was merged in 3.1.
In 3.2 only target cgroup is affected.quoted
quoted
I'm currently using kernel 3.2.34 but i'm having this problem since 2.6.32.I guess this is a clean vanilla (stable) kernel, right? Are you able to reproduce with the latest Linus tree?Well, no. I'm using, for example, newest stable grsecurity patch.
That shouldn't be related
I'm also using few of Andrea Righi's cgroup subsystems but i don't believe these are doing problems: - cgroup-uid which is moving processes into cgroups based on UID - cgroup-task which can limit number of tasks in cgroup (i already tried to disable this one, it didn't help) http://www.develer.com/~arighi/linux/patches/
I am not familiar with those pathces but I will double check.
Unfortunately i cannot just install new and untested kernel version cos i'm not able to reproduce this problem anytime (it's happening randomly in production environment).
This will make it a bit harder to debug but let's see maybe the new traces would help...
Could it be that OOM cannot start and kill processes because there's no free memory in cgroup?
That shouldn't happen. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>