cgroup_attach_task && while_each_thread (Was: cgroup attach task - slogging cpu)
From: Oleg Nesterov <hidden>
Date: 2013-10-09 18:20:28
Also in:
lkml
And I am starting to think that this change should also fix the
while_each_thread() problems in this particular case.
In generak the code like
rcu_read_lock();
task = find_get_task(...);
rcu_read_unlock();
rcu_read_lock();
t = task;
do {
...
} while_each_thread (task, t);
rcu_read_unlock();
is wrong even if while_each_thread() was correct (and we have a lot
of examples of this pattern). A GP can pass before the 2nd rcu-lock,
and we simply can't trust ->thread_group.next.
But I didn't notice that cgroup_attach_task(tsk, threadgroup) can only
be called with threadgroup == T when a) tsk is ->group_leader and b)
we hold threadgroup_lock() which blocks de_thread(). IOW, in this case
"tsk" can't be removed from ->thread_group list before other threads.
If next_thread() sees thread_group.next != leader, we know that the
that .next thread didn't do __unhash_process() yet, and since we
know that in this case "leader" didn't do this too we are safe.
In short: __unhash_process(leader) (in this) case can never change
->thread_group.next of another thread, because leader->thread_group
should be already list_empty().
On 10/09, Oleg Nesterov wrote:On 10/09, Oleg Nesterov wrote:quoted
On 10/09, Li Zefan wrote:quoted
Anjana, could you revise the patch and send it out with proper changelog and Signed-off-by? And please add "Cc: [off-list ref] # 3.9+"Yes, Anjana, please!Please note also that the PF_EXITING check has the same problem, it also needs "goto next".quoted
quoted
quoted
check in the main loop. So Anjana was right (sorry again!), and we should probably do ent.cgrp = task_cgroup_from_root(...); if (ent.cgrp != cgrp) { retval = flex_array_put(...); ... } if (!threadgroup) break;Or do { ... if (ent.cgrp == cgrp) goto next;Or this, agreed.quoted
quoted
Or I am wrong again?No, you are not! :)Thanks ;) Oleg.