Thread (6 messages) 6 messages, 2 authors, 2015-12-12

[4.1.3-rt3] [report][suspend to ram] BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:917

From: grygorii.strashko@ti.com (Grygorii Strashko)
Date: 2015-09-04 16:46:51
Also in: linux-rt-users
Subsystem: arm port, the rest · Maintainers: Russell King, Linus Torvalds

Hi All,

On 08/17/2015 06:46 PM, Grygorii Strashko wrote:
On 08/16/2015 02:42 PM, Sebastian Andrzej Siewior wrote:
quoted
* Grygorii Strashko | 2015-08-12 21:25:50 [+0300]:
quoted
I can constantly see below error report with RT-kernel on TI ARM dra7-evm
if I'm trying to do Suspend to RAM.
do you see the same problem on x86 with -RT?
Unfortunately, I'm not working with x86 now, and I'm not sure I'll be able
to try it in the nearest future.

Below, I've tried to analyze involved code path and it seems expected
to call clear_tasks_mm_cpumask() in atomic context, as the whole
take_cpu_down() call chain expected to be atomic.
quoted
quoted
Disabling non-boot CPUs ...
PM: noirq suspend of devices complete after 7.295 msecs
[  100.285729] Disabling non-boot CPUs ...
[  100.287575] BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:917
[  100.287580] in_atomic(): 1, irqs_disabled(): 128, pid: 18, name: migration/1
[  100.287583] INFO: lockdep is turned off.
[  100.287586] irq event stamp: 122
[  100.287600] hardirqs last  enabled at (121): [<c06ac0ac>] _raw_spin_unlock_irqrestore+0x88/0x90
[  100.287609] hardirqs last disabled at (122): [<c06abed0>] _raw_spin_lock_irq+0x28/0x5c
[  100.287620] softirqs last  enabled at (0): [<c003d294>] copy_process.part.52+0x410/0x19d8
[  100.287625] softirqs last disabled at (0): [<  (null)>]   (null)
[  100.287630] Preemption disabled at:[<  (null)>]   (null)
[  100.287631]
[  100.287639] CPU: 1 PID: 18 Comm: migration/1 Tainted: G        W       4.1.4-rt3-01046-g96ac8da #204
[  100.287642] Hardware name: Generic DRA74X (Flattened Device Tree)
[  100.287659] [<c0019134>] (unwind_backtrace) from [<c0014774>] (show_stack+0x20/0x24)
[  100.287671] [<c0014774>] (show_stack) from [<c06a70f4>] (dump_stack+0x88/0xdc)
[  100.287681] [<c06a70f4>] (dump_stack) from [<c006cab8>] (___might_sleep+0x198/0x2a8)
[  100.287689] [<c006cab8>] (___might_sleep) from [<c06ac4dc>] (rt_spin_lock+0x30/0x70)
[  100.287699] [<c06ac4dc>] (rt_spin_lock) from [<c013f790>] (find_lock_task_mm+0x9c/0x174)
this is task_lock() which takes a sleeping lock.
quoted
[  100.287710] [<c013f790>] (find_lock_task_mm) from [<c00409ac>] (clear_tasks_mm_cpumask+0xb4/0x1ac)
[  100.287720] [<c00409ac>] (clear_tasks_mm_cpumask) from [<c00166a4>] (__cpu_disable+0x98/0xbc)
[  100.287728] [<c00166a4>] (__cpu_disable) from [<c06a2e8c>] (take_cpu_down+0x1c/0x50)
[  100.287742] [<c06a2e8c>] (take_cpu_down) from [<c00f2600>] (multi_cpu_stop+0x11c/0x158)
[  100.287754] [<c00f2600>] (multi_cpu_stop) from [<c00f2a9c>] (cpu_stopper_thread+0xc4/0x184)
this function contains local_save_flags().
multi_cpu_stop:
	local_save_flags(flags);

	if (!msdata->active_cpus)
		is_active = cpu == cpumask_first(cpu_online_mask);
	else
		is_active = cpumask_test_cpu(cpu, msdata->active_cpus);

	/* Simple state machine */
	do {
		/* Chill out and ensure we re-read multi_stop_state. */
		cpu_relax();
		if (msdata->state != curstate) {
			curstate = msdata->state;
			switch (curstate) {
			case MULTI_STOP_DISABLE_IRQ:
				local_irq_disable();
				hard_irq_disable(); <==== Step 2 disable IRQs
				break;
			case MULTI_STOP_RUN:
				if (is_active)
					err = msdata->fn(msdata->data); ===> Step 3 take_cpu_down()
				break;
			default:
				break;
			}
			ack_state(msdata);
		}
	} while (curstate != MULTI_STOP_EXIT);

	local_irq_restore(flags);
quoted
quoted
[  100.287767] [<c00f2a9c>] (cpu_stopper_thread) from [<c0069058>] (smpboot_thread_fn+0x18c/0x324)
cpu_stopper_thread:
		preempt_disable();

		ret = fn(arg); ===> multi_cpu_stop()
		if (ret)
			done->ret = ret;

		preempt_enable();

quoted
quoted
[  100.287779] [<c0069058>] (smpboot_thread_fn) from [<c00649c4>] (kthread+0xe8/0x104)
[  100.287791] [<c00649c4>] (kthread) from [<c0010058>] (ret_from_fork+0x14/0x3c)
[  100.288114] CPU1: shutdown
The local_save_flags() should be probably replaced with something else.
Below is list of clear_tasks_mm_cpumask() users:
arch/arm/kernel/smp.c
   int __cpu_disable(void)
   216: clear_tasks_mm_cpumask(cpu);
arch/arm64/kernel/smp.c
   int __cpu_disable(void)
   238: clear_tasks_mm_cpumask(cpu);
arch/metag/kernel/smp.c
   int __cpu_disable(void)
   290: clear_tasks_mm_cpumask(cpu);

arch/powerpc/mm/mmu_context_nohash.c
_cpu_down()
+ cpu_notify_nofail(CPU_DEAD | mod, hcpu);
   + mmu_context_cpu_notify()
     400: clear_tasks_mm_cpumask(cpu);

arch/sh/kernel/smp.c
   int __cpu_disable(void)
   155: clear_tasks_mm_cpumask(cpu);
arch/xtensa/kernel/smp.c
   int __cpu_disable(void)
   279: clear_tasks_mm_cpumask(cpu);

As per above, in all case, except *powerpc*, the clear_tasks_mm_cpumask()
is called from __cpu_disable() (atomic context).
While in powerpc case it's called from  _cpu_down() (if I understand right,
in non-atomic context).
Below diff fixes issue for me. More over, with this change I can do
CPU1 plug/unplug many times and suspend/resume works much more stable
(before it stuck some times after "Disabling non-boot CPUs ...").

I'd very appreciated for comments.
diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index 3d6b782..20b5741 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -215,8 +215,6 @@ int __cpu_disable(void)
        flush_cache_louis();
        local_flush_tlb_all();
 
-       clear_tasks_mm_cpumask(cpu);
-
        return 0;
 }
 
@@ -232,6 +230,8 @@ void __cpu_die(unsigned int cpu)
                pr_err("CPU%u: cpu didn't die\n", cpu);
                return;
        }
+       clear_tasks_mm_cpumask(cpu);
+
        pr_notice("CPU%u: shutdown\n", cpu);
 
        /*

-- 
regards,
-grygorii
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help