RE: v3.18-RT
From: Carol Wong <hidden>
Date: 2016-09-20 18:27:19
Hi Sebastian, You wrote:
One thing on the bisect. The git tree has the patches in this order: (1) kernel: migrate_disable() do fastpath in atomic & irqs-off (2) kernel: softirq: unlock with irqs on but you need apply Patch #2 before #1. So if you bisect and you hit warnings due to #1 please note that need apply #2. T01 and T02 show probably the same issue but there are too many warnings comming in parallel. If this comes from the sched patch due #1/#2 mix up then don't bisect here or have them both applied. The call path itself does look special as it would violate the rule of atomic locking / unlocking (as it was fixed in #2 for instance). At this point I assume that your bisect went wrong due to patch #1/#2.
The traces were produced using the original 3.18.29-rt30 kernel (with all patches) plus the addition of WARN_ON_ONCE(p->migrate_disable_atomic <= 0) in migrate_enable() and CONFIG_SCHED_DEBUG=y. When I revert only patch #1, from the 3.18.29-rt30 kernel, the kernel never crashes. I've been performing long-running tests on a dual Xeon system and a quad-core i7 system with patch #1 reverted. Cheers, Carol
-----Original Message----- From: Sebastian Andrzej Siewior [mailto:bigeasy@linutronix.de] Sent: Thursday, September 08, 2016 6:45 AM To: Carol Wong Cc: linux-rt-users@vger.kernel.org; David Hauck; Preston Hauck Subject: Re: v3.18-RT On 2016-08-19 00:41:46 [+0000], Carol Wong wrote:quoted
Hi Sebastian,Hi Carol,quoted
Were you able to gain any insight from the traces?not really. T00 shows a fault in [ 2.756284] BUG: unable to handle kernel NULL pointer dereference at 00000004 [ 2.756289] IP: [<c11653e7>] kmem_cache_alloc+0x87/0x230 from ida_pre_get() / create_worker(). That is quite late so I have no idea why that would happen. The other two are not really help full.quoted
If we were to proceed with reverting the kernel/sched/core.c patchin our build of 3.18.29-rt30, would the addition of the WARN_ON_ONCE(p->migrate_disable_atomic <= 0) debug check that you recommended (2016/07/29) be sufficient for detecting imbalances? We would perform extended testing on multiple systems to determine the effects of reverting the patch. One thing on the bisect. The git tree has the patches in this order: (1) kernel: migrate_disable() do fastpath in atomic & irqs-off (2) kernel: softirq: unlock with irqs on but you need apply Patch #2 before #1. So if you bisect and you hit warnings due to #1 please note that need apply #2. T01 and T02 show probably the same issue but there are too many warnings comming in parallel. If this comes from the sched patch due #1/#2 mix up then don't bisect here or have them both applied. The call path itself does look special as it would violate the rule of atomic locking / unlocking (as it was fixed in #2 for instance). At this point I assume that your bisect went wrong due to patch #1/#2.quoted
Cheers, CarolSebastian