Thread (11 messages) 11 messages, 3 authors, 2016-09-20
  • v3.18-RT · David Hauck <hidden> · 2016-05-31
  • Re: v3.18-RT · Sebastian Andrzej Siewior <bigeasy@linutronix.de> · 2016-06-03
  • RE: v3.18-RT · David Hauck <hidden> · 2016-06-03
  • Re: v3.18-RT · Sebastian Andrzej Siewior <bigeasy@linutronix.de> · 2016-06-06
  • RE: v3.18-RT · David Hauck <hidden> · 2016-06-06
  • Re: v3.18-RT · Sebastian Andrzej Siewior <bigeasy@linutronix.de> · 2016-06-07
  • RE: v3.18-RT · Carol Wong <hidden> · 2016-07-20
  • Re: v3.18-RT · Sebastian Andrzej Siewior <bigeasy@linutronix.de> · 2016-07-29
  • RE: v3.18-RT · Carol Wong <hidden> · 2016-08-19
  • Re: v3.18-RT · Sebastian Andrzej Siewior <bigeasy@linutronix.de> · 2016-09-08
  • RE: v3.18-RT · Carol Wong <hidden> · 2016-09-20

RE: v3.18-RT

From: Carol Wong <hidden>
Date: 2016-08-19 00:49:01

Hi Sebastian,

Were you able to gain any insight from the traces?

If we were to proceed with reverting the kernel/sched/core.c patch in our build of 3.18.29-rt30, would the addition of the WARN_ON_ONCE(p->migrate_disable_atomic <= 0) debug check that you recommended (2016/07/29) be sufficient for detecting imbalances? We would perform extended testing on multiple systems to determine the effects of reverting the patch.

Cheers,
Carol
-----Original Message-----
From: Carol Wong
Sent: Wednesday, August 03, 2016 6:32 PM
To: 'Sebastian Andrzej Siewior'
Cc: linux-rt-users@vger.kernel.org; David Hauck; Preston Hauck
Subject: RE: v3.18-RT

Hi Sebastian,

I made the suggested change to sched/core.c and verified that
CONFIG_SCHED_DEBUG=y. I reproduced the crash 3 times and captured the
attached traces.

Thanks,
Carol
quoted
-----Original Message-----
From: Sebastian Andrzej Siewior [mailto:bigeasy@linutronix.de]
Sent: Friday, July 29, 2016 9:20 AM
To: Carol Wong
Cc: linux-rt-users@vger.kernel.org; David Hauck; Preston Hauck
Subject: Re: v3.18-RT

* Carol Wong | 2016-07-20 20:53:21 [+0000]:
quoted
Hi Sebastian,
Hi Carol,
quoted
We finally traced the boot-up crash to the following patch in
kernel/sched/core.c:
quoted
https://git.kernel.org/cgit/linux/kernel/git/rt/linux-stable-
rt.git/com
quoted
mit/?h=v3.18-rt&id=62044e554f14547061afcfef7f0aceda43e28982

After reverting the two-line patch in 3.18.29-rt30, the crash no
longer occurs on our dual Xeon (2x12 core) system.
quoted
Other observations:
- Does not reproduce on single processor (2 and 4 core) systems
- Reproduces under 3.18.27-rt27 and 3.18.36-rt38 on the dual Xeon
- Does not reproduce on 3.18.27-rt26 and earlier on the dual Xeon
- Reproduces more frequently on .29-rt30 (1 in 20 reboots)
compared
quoted
to
quoted
.27-rt27 (1 in 100 reboots)

So far we've not observed any side effects after reverting this
patch.

This was part of CPU hotplug fixups. Lockdep might be broken
without
quoted
it but I am not sure if is most of the time the case or just during
hotplug.
quoted
I understand that a high core count system may not be easy to come
by, so if there are diagnostics or patches you would like to try on
the dual Xeon system, we can assist with that.

With that patch, migrate_disable() skips the whole preempt-lazy +
pin-cpu code if called with IRQs off. Since interrupts are disabled
we
quoted
can't migrate to another so it is a possible optimsation.
It only makes a difference if migrate_disable() + migrate_enable()
calls are not in balance. The commit
  https://git.kernel.org/cgit/linux/kernel/git/rt/linux-stable-
rt.git/commit/?h=v3.18-
rt&id=8d51d3a296b6ec4aebd0d6d7e1b7162cd9bf6662
quoted
is one example where I fixed the inbalance.
Do you get additional backtraces with CONFIG_SCHED_DEBUG enabled?

There is one thing the debug code does not cover, so could you
please
quoted
add this chunk?
diff --git a/kernel/sched/core.c b/kernel/sched/core.c index
140ee06079b6..1f8613f77598 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3229,6 +3229,7 @@ void migrate_enable(void)

 	if (in_atomic() || irqs_disabled()) {  #ifdef
CONFIG_SCHED_DEBUG
quoted
+		WARN_ON_ONCE(p->migrate_disable_atomic <= 0);
 		p->migrate_disable_atomic--;
 #endif
 		return;
quoted
Cheers,
Carol Wong
NetAcquire Corporation
Sebastian
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help