Thread (33 messages) 33 messages, 8 authors, 2011-09-08

Re: [ANNOUNCE] 3.0.1-rt11

From: Frank Rowand <hidden>
Date: 2011-09-07 02:54:10
Also in: lkml

On 08/26/11 16:55, Paul E. McKenney wrote:
On Wed, Aug 24, 2011 at 04:58:49PM -0700, Frank Rowand wrote:
quoted
On 08/13/11 03:53, Peter Zijlstra wrote:
quoted
Whee, I can skip release announcements too!

So no the subject ain't no mistake its not, 3.0.1-rt11 is there for the
grabs.
< snip >
quoted
I have a consistent (every boot) hang on boot.  With a few
hacks to get console output, I get the

  rcu_preempt_state detected stalls on CPUs/tasks
< snip >
quoted
This is an ARM NaviEngine (out of tree, so I also have applied
a series of pages for platform support).

CONFIG_PREEMPT_RT_FULL is set.  Full config is attached.
I have also replicated the problem on the ARM RealView (in tree) and
without the RT patches.
Hmmm...  The last few that I have seen that looked like this were
due to my messing up rcutorture so that the RCU-boost testing kthreads
ran CPU-bound at real-time priority.

Is it possible that something similar is happening on your system?

                                                        Thanx, Paul
The problem ended up being caused by the allowed cpus mask being set
to all possible cpus for the ksoftirqd on the secondary processors.
So the RCU softirq was never executing on cpu 2.

I'll test the following patch on 3.1 tomorrow.

-Frank Rowand


Symptom: rcu stall

The problem was that ksoftirqd was woken on the secondary processors before
the secondary processors were online.  This led to allowed cpus being set
to all cpus.

   wake_up_process()
      try_to_wake_up()
         select_task_rq()
            if (... || !cpu_online(cpu))
               select_fallback_rq(task_cpu(p), p)
                  ...
                  /* No more Mr. Nice Guy. */
                  dest_cpu = cpuset_cpus_allowed_fallback(p)
                     do_set_cpus_allowed(p, cpu_possible_mask)
                        #  Thus ksoftirqd can now run on any cpu...


Signed-off-by: Frank Rowand <redacted>
---
 kernel/softirq.c |   19 	14 +	5 -	0 !
 1 file changed, 14 insertions(+), 5 deletions(-)

Index: b/kernel/softirq.c
===================================================================
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -55,6 +55,7 @@ EXPORT_SYMBOL(irq_stat);
 static struct softirq_action softirq_vec[NR_SOFTIRQS] __cacheline_aligned_in_smp;
 
 DEFINE_PER_CPU(struct task_struct *, ksoftirqd);
+DEFINE_PER_CPU(struct task_struct *, ksoftirqd_pending_online);
 
 char *softirq_to_name[NR_SOFTIRQS] = {
 	"HI", "TIMER", "NET_TX", "NET_RX", "BLOCK", "BLOCK_IOPOLL",
@@ -862,28 +863,36 @@ static int __cpuinit cpu_callback(struct
 			return notifier_from_errno(PTR_ERR(p));
 		}
 		kthread_bind(p, hotcpu);
-  		per_cpu(ksoftirqd, hotcpu) = p;
+		per_cpu(ksoftirqd_pending_online, hotcpu) = p;
  		break;
 	case CPU_ONLINE:
 	case CPU_ONLINE_FROZEN:
+		per_cpu(ksoftirqd, hotcpu) =
+			per_cpu(ksoftirqd_pending_online, hotcpu);
+		per_cpu(ksoftirqd_pending_online, hotcpu) = NULL;
 		wake_up_process(per_cpu(ksoftirqd, hotcpu));
 		break;
 #ifdef CONFIG_HOTPLUG_CPU
 	case CPU_UP_CANCELED:
 	case CPU_UP_CANCELED_FROZEN:
-		if (!per_cpu(ksoftirqd, hotcpu))
+		p = per_cpu(ksoftirqd_pending_online, hotcpu);
+		if (!p)
+			p = per_cpu(ksoftirqd, hotcpu);
+		if (!p)
 			break;
 		/* Unbind so it can run.  Fall thru. */
-		kthread_bind(per_cpu(ksoftirqd, hotcpu),
-			     cpumask_any(cpu_online_mask));
+		kthread_bind(p, cpumask_any(cpu_online_mask));
 	case CPU_DEAD:
 	case CPU_DEAD_FROZEN: {
 		static const struct sched_param param = {
 			.sched_priority = MAX_RT_PRIO-1
 		};
 
-		p = per_cpu(ksoftirqd, hotcpu);
+		p = per_cpu(ksoftirqd_pending_online, hotcpu);
+		if (!p)
+			p = per_cpu(ksoftirqd, hotcpu);
 		per_cpu(ksoftirqd, hotcpu) = NULL;
+		per_cpu(ksoftirqd_pending_online, hotcpu) = NULL;
 		sched_setscheduler_nocheck(p, SCHED_FIFO, &param);
 		kthread_stop(p);
 		takeover_tasklets(hotcpu);
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help