Thread (4 messages) 4 messages, 2 authors, 2008-06-04

Re: [PATCH] sched: fix cpupri hotplug support

From: Gregory Haskins <hidden>
Date: 2008-06-04 14:30:05
Also in: lkml

quoted
quoted
On Wed, Jun 4, 2008 at  7:59 AM, in message <1212580796.23439.10.camel@twins>,
Peter Zijlstra [off-list ref] wrote: 
On Tue, 2008-06-03 at 23:24 -0400, Gregory Haskins wrote:
quoted
Hi Ingo, Steven, Thomas,
  This patch is critical to fix a snafu in the root-domain/cpupri code.  It
  applies to 25.4-rt4, but as mentioned below it affects most recent -rt
  kernels as well as sched-devel.  If you accept this patch, and you have
  non-trivial merge issues with other flavors just let me know and I will 
port
quoted
  it to those kernels as well.

Regards
-Greg

--------------------------------
sched: fix cpupri hotplug support

The RT folks over at RedHat found an issue w.r.t. hotplug support which
was traced to problems with the cpupri infrastructure in the scheduler:

https://bugzilla.redhat.com/show_bug.cgi?id=449676

This bug affects 23-rt12+, 24-rtX, 25-rtX, and sched-devel.  This patch
applies to 25.4-rt4, though it should trivially apply to most cpupri enabled
kernels mentioned above.

It turned out that the issue was that offline cpus could get inadvertently
registered with cpupri so that they were erroneously selected during
migration decisions.  The end result would be an OOPS as the offline cpu
had tasks routed to it.
This is not what I saw, I traced it to select_task_rq_rt() selecting a
cpu that never was online. In my case cpu 2 on a dual core.
First off, Peter, I didnt realize you were involved too.  So I apologize for omitting you in the credits. :(

Note that despite the reproduction case difference, I think we are still talking about the same basic problem here...and I think your observation would also be fixed by this patch.

The issue is that a cpu that is not online is being set to something other than INVALID in cpupri, so it comes up as a valid target during the various migration decisions.  This should now be fixed.

quoted
This patch adds a new per-sched-class pair of interfaces:
online/offline. They allow for the sched-class to register for
notification when a run-queue is taken online or offline.  Additionally,
the existing join/leave-domain code has been unified behind the new
online/offline handlers so that they do the right thing w.r.t. the online
status of a particular CPU.

I was able to easily reproduce the issue prior to this patch, and am no
longer able to reproduce it after this patch.  I can offline cpus
indefinately and everything seems to be in working order.

Thanks to Arnaldo (acme) and Thomas (tglx) for doing the legwork to point
me in the right direction. 

Signed-off-by: Gregory Haskins <redacted>
CC: Ingo Molnar <redacted>
CC: Thomas Gleixner <redacted>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: Arnaldo Carvalho de Melo <redacted>
---

 include/linux/sched.h |    2 ++
 kernel/sched.c        |   19 +++++++++++++++++++
 kernel/sched_rt.c     |   16 ++++++++++++----
 3 files changed, 33 insertions(+), 4 deletions(-)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index bb71371..9c5b8c9 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -970,6 +970,8 @@ struct sched_class {
 
 	void (*join_domain)(struct rq *rq);
 	void (*leave_domain)(struct rq *rq);
+	void (*online)(struct rq *rq);
+	void (*offline)(struct rq *rq);
Rather unfortunate to add yet another two callbacks, can't this be done
with a creative use of leave/join?
Yeah, I was debating how to do this.  In fact, as you can see the implementation of join/online and leave/offline is identical.  I wasnt sure if putting a join/leave in the hotplug code made sense, so I added a new callback that maps to the same logic.  Perhaps what I should have done was to s/join/online and s/leave/offline and fixed the root-domain code to use the online/offline variant.  I like this better so I will put out a v2 patch in a few minutes.
quoted
 	void (*switched_from) (struct rq *this_rq, struct task_struct *task,
 			       int running);
diff --git a/kernel/sched.c b/kernel/sched.c
index 0399411..fe96984 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -6278,6 +6278,9 @@ static void unregister_sched_domain_sysctl(void)
 }
 #endif
 
+#define for_each_class(class) \
+   for (class = sched_class_highest; class; class = class->next)
+
 /*
  * migration_call - callback that gets triggered when a CPU is added.
  * Here we can start up the necessary migration thread for the new CPU.
@@ -6314,8 +6317,15 @@ migration_call(struct notifier_block *nfb, unsigned 
long action, void *hcpu)
quoted
 		rq = cpu_rq(cpu);
 		spin_lock_irqsave(&rq->lock, flags);
 		if (rq->rd) {
+			const struct sched_class *class;
+
 			BUG_ON(!cpu_isset(cpu, rq->rd->span));
 			cpu_set(cpu, rq->rd->online);
+
+			for_each_class(class) {
+				if (class->online)
+					class->online(rq);
+			}
 		}
 		spin_unlock_irqrestore(&rq->lock, flags);
 		break;
@@ -6375,8 +6385,17 @@ migration_call(struct notifier_block *nfb, unsigned 
long action, void *hcpu)
quoted
 		rq = cpu_rq(cpu);
 		spin_lock_irqsave(&rq->lock, flags);
 		if (rq->rd) {
+			const struct sched_class *class;
+
 			BUG_ON(!cpu_isset(cpu, rq->rd->span));
+
+			for_each_class(class) {
+				if (class->offline)
+					class->offline(rq);
+			}
+
 			cpu_clear(cpu, rq->rd->online);
+
 		}
 		spin_unlock_irqrestore(&rq->lock, flags);
 		break;
I'm thinking this should be done in update_sched_domains().
Hmm...I think I agree with you.  I will add this to v2 as well.
quoted
diff --git a/kernel/sched_rt.c b/kernel/sched_rt.c
index 425cf01..8baf2e8 100644
--- a/kernel/sched_rt.c
+++ b/kernel/sched_rt.c
@@ -1101,8 +1101,11 @@ static void set_cpus_allowed_rt(struct task_struct *p, 
cpumask_t *new_mask)
quoted
 }
 
 /* Assumes rq->lock is held */
-static void join_domain_rt(struct rq *rq)
+static void rq_online_rt(struct rq *rq)
 {
+	if (!cpu_isset(rq->cpu, rq->rd->online))
+		return;
+
 	if (rq->rt.overloaded)
 		rt_set_overload(rq);
 
@@ -1110,8 +1113,11 @@ static void join_domain_rt(struct rq *rq)
 }
 
 /* Assumes rq->lock is held */
-static void leave_domain_rt(struct rq *rq)
+static void rq_offline_rt(struct rq *rq)
 {
+	if (!cpu_isset(rq->cpu, rq->rd->online))
+		return;
+
 	if (rq->rt.overloaded)
 		rt_clear_overload(rq);
So you now fully remove the leave/join calls because switching between
root domains doesn't need to migrate the overload status between those
domains? - seems unlikely.
I think you are misinterpreting that part.  leave/join is fully functional and intact.  I just merged the two implementations to call one function (see the sched_class assignment below)
quoted
@@ -1278,8 +1284,10 @@ const struct sched_class rt_sched_class = {
 	.load_balance		= load_balance_rt,
 	.move_one_task		= move_one_task_rt,
 	.set_cpus_allowed       = set_cpus_allowed_rt,
-	.join_domain            = join_domain_rt,
-	.leave_domain           = leave_domain_rt,
+	.join_domain            = rq_online_rt,
+	.leave_domain           = rq_offline_rt,
+	.online                 = rq_online_rt,
+	.offline                = rq_offline_rt,
 	.pre_schedule		= pre_schedule_rt,
 	.post_schedule		= post_schedule_rt,
 	.task_wake_up		= task_wake_up_rt,
  
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help