Thread (40 messages) 40 messages, 5 authors, 2018-01-16

Re: [PATCH] cgroup/cpuset: fix circular locking dependency

From: Prateek Sood <hidden>
Date: 2017-12-28 20:37:33
Also in: lkml

On 12/13/2017 09:36 PM, Tejun Heo wrote:
Hello, Prateek.

On Wed, Dec 13, 2017 at 01:20:46PM +0530, Prateek Sood wrote:
quoted
This change makes the usage of cpuset_hotplug_workfn() from cpu
hotplug path synchronous. For memory hotplug it still remains
asynchronous.
Ah, right.
quoted
Memory migration happening from cpuset_hotplug_workfn() is
already asynchronous by queuing cpuset_migrate_mm_workfn() in
cpuset_migrate_mm_wq.

cpuset_hotplug_workfn()
   cpuset_hotplug_workfn(()
      cpuset_migrate_mm()
         queue_work(cpuset_migrate_mm_wq)

It seems that memory migration latency might not have
impact with this change.

Please let me know if you meant something else by cpuset
migration taking time when memory migration is turned on.
No, I didn't.  I was just confused about which part became
synchronous.  So, I don't have anything against making the cpu part
synchronous, but let's not do that as the fix to the deadlocks cuz,
while we can avoid them by changing cpuset, I don't think cpuset is
the root cause for them.  If there are benefits to making cpuset cpu
migration synchronous, let's do that for those benefits.

Thanks.
TJ,

One more deadlock scenario

Task: sh 
[<ffffff874f917290>] wait_for_completion+0x14
[<ffffff874e8a82e0>] cpuhp_kick_ap_work+0x80 //waiting for cpuhp/2
[<ffffff874f913780>] _cpu_down+0xe0
[<ffffff874e8a9934>] cpu_down+0x38
[<ffffff874ef6e4ec>] cpu_subsys_offline+0x10

Task: cpuhp/2 
[<ffffff874f91645c>] schedule+0x38
[<ffffff874e92c76c>] _synchronize_rcu_expedited+0x2ec
[<ffffff874e92c874>] synchronize_sched_expedited+0x60
[<ffffff874e92c9f8>] synchronize_sched+0xb0
[<ffffff874e9104e4>] sugov_stop+0x58
[<ffffff874f36967c>] cpufreq_stop_governor+0x48
[<ffffff874f36a89c>] cpufreq_offline+0x84
[<ffffff874f36aa30>] cpuhp_cpufreq_offline+0xc
[<ffffff874e8a797c>] cpuhp_invoke_callback+0xac
[<ffffff874e8a89b4>] cpuhp_down_callbacks+0x58
[<ffffff874e8a95e8>] cpuhp_thread_fun+0xa8

_synchronize_rcu_expedited is waiting for execution of rcu
expedited grace period work item wait_rcu_exp_gp()

Task: kworker/2:1 
[<ffffff874f91645c>] schedule+0x38
[<ffffff874f916870>] schedule_preempt_disabled+0x20
[<ffffff874f918df8>] __mutex_lock_slowpath+0x158
[<ffffff874f919004>] mutex_lock+0x14
[<ffffff874e8a77b0>] get_online_cpus+0x34 //waiting for cpu_hotplug_lock
[<ffffff874e96452c>] rebuild_sched_domains+0x30
[<ffffff874e964648>] cpuset_hotplug_workfn+0xb8
[<ffffff874e8c27b8>] process_one_work+0x168
[<ffffff874e8c2ff4>] worker_thread+0x140
[<ffffff874e8c95b8>] kthread+0xe0

cpu_hotplug_lock is acquired by task: sh


Task: kworker/2:3
[<ffffff874f91645c>] schedule+0x38
[<ffffff874f91a384>] schedule_timeout+0x1d8
[<ffffff874f9171d4>] wait_for_common+0xb4
[<ffffff874f917304>] wait_for_completion_killable+0x14 //waiting for kthreadd 
[<ffffff874e8c931c>] __kthread_create_on_node+0xec
[<ffffff874e8c9448>] kthread_create_on_node+0x64
[<ffffff874e8c2d88>] create_worker+0xb4
[<ffffff874e8c3194>] worker_thread+0x2e0
[<ffffff874e8c95b8>] kthread+0xe0


Task: kthreadd
[<ffffff874e8858ec>] __switch_to+0x94
[<ffffff874f915ed8>] __schedule+0x2a8
[<ffffff874f91645c>] schedule+0x38
[<ffffff874f91a0ec>] rwsem_down_read_failed+0xe8
[<ffffff874e913dc4>] __percpu_down_read+0xfc
[<ffffff874e8a4ab0>] copy_process.isra.72.part.73+0xf60
[<ffffff874e8a53b8>] _do_fork+0xc4
[<ffffff874e8a5720>] kernel_thread+0x34
[<ffffff874e8ca83c>] kthreadd+0x144

kthreadd is waiting for cgroup_threadgroup_rwsem acquired
by task T

Task: T
[<ffffff874f91645c>] schedule+0x38
[<ffffff874f916870>] schedule_preempt_disabled+0x20
[<ffffff874f918df8>] __mutex_lock_slowpath+0x158
[<ffffff874f919004>] mutex_lock+0x14 
[<ffffff874e962ff4>] cpuset_can_attach+0x58
[<ffffff874e95d640>] cgroup_taskset_migrate+0x8c
[<ffffff874e95d9b4>] cgroup_migrate+0xa4
[<ffffff874e95daf0>] cgroup_attach_task+0x100
[<ffffff874e95df28>] __cgroup_procs_write.isra.35+0x228
[<ffffff874e95e00c>] cgroup_tasks_write+0x10
[<ffffff874e958294>] cgroup_file_write+0x44
[<ffffff874eaa4384>] kernfs_fop_write+0xc0

task T is waiting for cpuset_mutex acquired
by kworker/2:1

sh ==> cpuhp/2 ==> kworker/2:1 ==> sh 

kworker/2:3 ==> kthreadd ==> Task T ==> kworker/2:1

It seems that my earlier patch set should fix this scenario:
1) Inverting locking order of cpuset_mutex and cpu_hotplug_lock.
2) Make cpuset hotplug work synchronous.


Could you please share your feedback.


Thanks

-- 
Qualcomm India Private Limited, on behalf of Qualcomm Innovation
Center, Inc., is a member of Code Aurora Forum, a Linux Foundation
Collaborative Project
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help