Re: [v5 2/4] mm, oom: cgroup-aware OOM killer
From: Roman Gushchin <hidden>
Date: 2017-08-23 16:21:06
Also in:
linux-mm, lkml
Hi Johannes! Thank you for review! I do agree with most of the comments, and I will address them in v6. I'll post it soon. Please, find some comments below. On Tue, Aug 22, 2017 at 01:03:44PM -0400, Johannes Weiner wrote:
Hi Roman, great work! This looks mostly good to me now. Below are some nitpicks concerning naming and code layout, but nothing major.quoted
+ + css_task_iter_start(&memcg->css, 0, &it); + while ((task = css_task_iter_next(&it))) { + /* + * If there are no tasks, or all tasks have oom_score_adj set + * to OOM_SCORE_ADJ_MIN and oom_kill_all_tasks is not set, + * don't select this memory cgroup. + */ + if (!elegible && + (memcg->oom_kill_all_tasks || + task->signal->oom_score_adj != OOM_SCORE_ADJ_MIN)) + elegible = 1;This is a little awkward to read. How about something like this: /* * When killing individual tasks, we respect OOM score adjustments: * at least one task in the group needs to be killable for the group * to be oomable. * * Also check that previous OOM kills have finished, and abort if * there are any pending OOM victims. */ oomable = memcg->oom_kill_all_tasks; while ((task = css_task_iter_next(&it))) { if (!oomable && task->signal_oom_score_adj != OOM_SCORE_ADJ_MIN) oomable = 1;quoted
+ if (tsk_is_oom_victim(task) && + !test_bit(MMF_OOM_SKIP, &task->signal->oom_mm->flags)) { + elegible = -1; + break; + } + } + css_task_iter_end(&it);
We ignore oom_score_adj if oom_kill_all_tasks is set, it's not reflected in your version. Anyway, I've moved the comments block outside and rephrased it to make more clear.
etc.quoted
+ + return elegible > 0 ? memcg_oom_badness(memcg, nodemask) : elegible;I find these much easier to read if broken up, even if it's more LOC: if (eligible <= 0) return eligible; return memcg_oom_badness(memcg, nodemask);quoted
+static void select_victim_memcg(struct mem_cgroup *root, struct oom_control *oc) +{ + struct mem_cgroup *iter, *parent; + + for_each_mem_cgroup_tree(iter, root) { + if (memcg_has_children(iter)) { + iter->oom_score = 0; + continue; + } + + iter->oom_score = oom_evaluate_memcg(iter, oc->nodemask); + if (iter->oom_score == -1) {Please add comments to document the special returns. Maybe #defines would be clearer, too.quoted
+ oc->chosen_memcg = (void *)-1UL; + mem_cgroup_iter_break(root, iter); + return; + } + + if (!iter->oom_score) + continue;Same here. Maybe a switch would be suitable to handle the abort/no-score cases.
Not sure about switch/defines, but I've added several comment blocks to describe possible return values, as well as their handling. Hope, it will be enough.
quoted
static int memory_events_show(struct seq_file *m, void *v) { struct mem_cgroup *memcg = mem_cgroup_from_css(seq_css(m));@@ -5310,6 +5512,12 @@ static struct cftype memory_files[] = { .write = memory_max_write, }, { + .name = "oom_kill_all_tasks", + .flags = CFTYPE_NOT_ON_ROOT, + .seq_show = memory_oom_kill_all_tasks_show, + .write = memory_oom_kill_all_tasks_write, + },This name is quite a mouthful and reminiscent of the awkward v1 interface names. It doesn't really go well with the v2 names. How about memory.oom_group?
I'd prefer to have something more obvious. I've renamed memory.oom_kill_all_tasks to memory.oom_kill_all, which was earlier suggested by Vladimir. Are you ok with it? Thanks! -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>