RE: v3.18-RT
From: David Hauck <hidden>
Date: 2016-06-06 17:45:17
Hi Sebastian, On Monday, June 06, 2016 12:02 AM, Sebastian Andrzej Siewior wrote:
On 06/03/2016 07:15 PM, David Hauck wrote: Hi David,quoted
On Fri, 3 Jun 2016 at 09:38:00, Sebastian Andrzej Siewior wrote:quoted
I am not aware of any lockup on v3.18-RT tree. I just tried a few boot up on two of machines and it looks good. Don't have currently any control on anything >4 cores.Thx. We've done further testing and see that v3.18.9 does not suffer the same problem. I also have some dump information (all "unable to handle kernel pagingrequest") and was wondering what the best way to pass this along to the list might be? Would a compressed archive of the (4) log files be OK to send along? That "unable to handle kernel paging request" shouldn't be much. Please send it to the list. The first BUG backtrace is the important one.
Thx, here's one - hope this might be helpful: [ 1.352165] BUG: unable to handle kernel [ 1.352167] paging request [ 1.352169] at a93c2560 [ 1.352172] IP: [ 1.352178] [<c107c248>] can_migrate_task+0x58/0x220 [ 1.352182] *pde = 00000000 [ 1.352183] [ 1.352187] Oops: 0000 [#1] [ 1.352189] PREEMPT [ 1.352190] SMP [ 1.352191] [ 1.352194] Modules linked in: [ 1.352194] [ 1.352198] CPU: 5 PID: 238 Comm: kthreadd Not tainted 3.18.29-rt30 #2 [ 1.352201] Hardware name: Default string Default string/HEP8225, BIOS HEPHF107 05/20/2016 [ 1.352205] task: db7d1d40 ti: db27e000 task.ti: db27e000 [ 1.352208] EIP: 0060:[<c107c248>] EFLAGS: 00010086 CPU: 5 [ 1.352212] EIP is at can_migrate_task+0x58/0x220 [ 1.352215] EAX: 00000005 EBX: db27fe10 ECX: 1830b404 EDX: a93c2560 [ 1.352217] ESI: dc508000 EDI: 00000002 EBP: db27fdbc ESP: db27fdb0 [ 1.352220] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 [ 1.352223] CR0: 80050033 CR2: a93c2560 CR3: 01a3a000 CR4: 001407d0 [ 1.352225] Stack: [ 1.352228] dc50805c [ 1.352230] 00000367 [ 1.352232] dcb53b88 [ 1.352234] db27fe60 [ 1.352236] c1083f81 [ 1.352238] 00000000 [ 1.352239] 00000000 [ 1.352241] 00000005 [ 1.352242] [ 1.352245] dcae5b30 [ 1.352246] dbe5d608 [ 1.352248] 00000000 [ 1.352250] dbe5d600 [ 1.352252] c1a30660 [ 1.352253] c1a30660 [ 1.352255] 00000000 [ 1.352257] 00000082 [ 1.352258] [ 1.352261] dcb53660 [ 1.352263] db27fe60 [ 1.352265] dbe3a1c0 [ 1.352266] dcb53b88 [ 1.352268] dcb53660 [ 1.352270] dc508000 [ 1.352272] 000003ef [ 1.352274] 0000019b [ 1.352275] [ 1.352277] Call Trace: [ 1.352284] [<c1083f81>] load_balance+0x321/0x8b0 [ 1.352293] [<c1084ad8>] pick_next_task_fair+0x5c8/0xb10 [ 1.352300] [<c1072bd1>] ? dequeue_task+0x91/0xc0 [ 1.352307] [<c16bce2a>] __schedule+0xfa/0xae0 [ 1.352313] [<c16c06c7>] ? _raw_spin_unlock_irqrestore+0x17/0x50 [ 1.352320] [<c107699b>] ? try_to_wake_up+0x5b/0x550 [ 1.352324] [<c1076f5f>] ? wake_up_state+0xf/0x20 [ 1.352330] [<c108b0cc>] ? __swait_wake_locked+0x3c/0x80 [ 1.352336] [<c1065930>] ? process_one_work+0x410/0x410 [ 1.352342] [<c16bd83b>] schedule+0x2b/0x90 [ 1.352347] [<c1069e73>] kthread+0x73/0xb0 [ 1.352354] [<c1060000>] ? SyS_olduname+0x100/0x180 [ 1.352360] [<c16c1081>] ret_from_kernel_thread+0x21/0x30 [ 1.352365] [<c1069e00>] ? kthread_worker_fn+0x160/0x160 [ 1.352368] Code: [ 1.352370] 00 [ 1.352372] 8b [ 1.352374] be [ 1.352376] 54 [ 1.352378] 02 [ 1.352379] 00 [ 1.352381] 00 [ 1.352383] 8d [ 1.352385] 96 [ 1.352387] 60 [ 1.352389] 02 [ 1.352390] 00 [ 1.352392] 00 [ 1.352394] 85 [ 1.352396] ff [ 1.352398] 74 [ 1.352400] 1a [ 1.352402] 8b [ 1.352403] 56 [ 1.352405] 08 [ 1.352407] 8b [ 1.352409] 4a [ 1.352410] 10 [ 1.352412] 89 [ 1.352414] ca [ 1.352415] 83 [ 1.352417] e2 [ 1.352419] 1f [ 1.352420] c1 [ 1.352422] e9 [ 1.352424] 05 [ 1.352426] 8d [ 1.352427] 14 [ 1.352429] 95 [ 1.352431] 24 [ 1.352432] d9 [ 1.352434] 6c [ 1.352436] c1 [ 1.352437] c1 [ 1.352439] e1 [ 1.352441] 02 [ 1.352443] 29 [ 1.352445] ca [ 1.352447] <0f> [ 1.352448] a3 [ 1.352450] 02 [ 1.352452] 19 [ 1.352454] c0 [ 1.352455] 85 [ 1.352457] c0 [ 1.352459] 0f [ 1.352461] 85 [ 1.352462] c3 [ 1.352464] 00 [ 1.352466] 00 [ 1.352467] 00 [ 1.352469] 83 [ 1.352471] 86 [ 1.352472] 00 [ 1.352474] 01 [ 1.352476] 00 [ 1.352478] 00 [ 1.352480] 01 [ 1.352481] 83 [ 1.352482] [ 1.352485] EIP: [<c107c248>] [ 1.352489] can_migrate_task+0x58/0x220 [ 1.352490] SS:ESP 0068:db27fdb0 [ 1.352493] CR2: 00000000a93c2560 [ 71.711659] ---[ end trace 0000000000000001 ]--- [ 71.711661] note: kthreadd[238] exited with preempt_count 2 [ 71.711666] WARNING: CPU: 5 PID: 238 at kernel/smp.c:293 smp_call_function_single+0xb4/0xe0() [ 71.711667] Modules linked in: [ 71.711669] CPU: 5 PID: 238 Comm: kthreadd Tainted: G D 3.18.29-rt30 #2 [ 71.711669] Hardware name: Default string Default string/HEP8225, BIOS HEPHF107 05/20/2016 [ 71.711671] 00000000 00000000 db27fb60 c16bb56f 00000000 db27fb94 c104f078 c185a058 [ 71.711673] 00000005 000000ee c18504dc 00000125 c10bf314 00000125 c10bf314 ffffffff [ 71.711675] 00000005 c110fac0 db27fba4 c104f140 00000009 00000000 db27fbc4 c10bf314 [ 71.711675] Call Trace: [ 71.711678] [<c16bb56f>] dump_stack+0x46/0x5c [ 71.711680] [<c104f078>] warn_slowpath_common+0x88/0xb0 [ 71.711681] [<c10bf314>] ? smp_call_function_single+0xb4/0xe0 [ 71.711682] [<c10bf314>] ? smp_call_function_single+0xb4/0xe0 [ 71.711685] [<c110fac0>] ? cpu_clock_event_add+0x20/0x20 [ 71.711686] [<c104f140>] warn_slowpath_null+0x20/0x30 [ 71.711687] [<c10bf314>] smp_call_function_single+0xb4/0xe0 [ 71.711689] [<c110fbc0>] ? perf_event_disable+0x90/0x90 [ 71.711691] [<c110e9fc>] task_function_call+0x3c/0x50 [ 71.711692] [<c1114fe0>] ? perf_cgroup_switch+0x1f0/0x1f0 [ 71.711694] [<c110fbdf>] perf_cgroup_exit+0x1f/0x30 [ 71.711696] [<c10cefd3>] cgroup_exit+0xb3/0x100 [ 71.711698] [<c105084a>] do_exit+0x32a/0x9c0 [ 71.711699] [<c16bab81>] ? printk+0x1c/0x1e [ 71.711702] [<c1099d8b>] ? kmsg_dump+0xcb/0xd0 [ 71.711704] [<c1005eff>] oops_end+0x8f/0xd0 [ 71.711707] [<c1041430>] no_context+0xf0/0x230 [ 71.711709] [<c1041625>] __bad_area_nosemaphore+0xb5/0x150 [ 71.711711] [<c10834cd>] ? update_sd_lb_stats+0x12d/0x3d0 [ 71.711713] [<c10416d7>] bad_area_nosemaphore+0x17/0x20 [ 71.711714] [<c1041bbb>] __do_page_fault+0x9b/0x620 [ 71.711716] [<c10837a9>] ? find_busiest_group+0x39/0x4f0 [ 71.711719] [<c1042140>] ? __do_page_fault+0x620/0x620 [ 71.711720] [<c104214b>] do_page_fault+0xb/0x10 [ 71.711721] [<c16c1e3a>] error_code+0x5a/0x60 [ 71.711723] [<c1042140>] ? __do_page_fault+0x620/0x620 [ 71.711725] [<c107c248>] ? can_migrate_task+0x58/0x220 [ 71.711726] [<c1083f81>] load_balance+0x321/0x8b0 [ 71.711729] [<c1084ad8>] pick_next_task_fair+0x5c8/0xb10 [ 71.711731] [<c1072bd1>] ? dequeue_task+0x91/0xc0 [ 71.711733] [<c16bce2a>] __schedule+0xfa/0xae0 [ 71.711734] [<c16c06c7>] ? _raw_spin_unlock_irqrestore+0x17/0x50 [ 71.711736] [<c107699b>] ? try_to_wake_up+0x5b/0x550 [ 71.711737] [<c1076f5f>] ? wake_up_state+0xf/0x20 [ 71.711738] [<c108b0cc>] ? __swait_wake_locked+0x3c/0x80 [ 71.711740] [<c1065930>] ? process_one_work+0x410/0x410 [ 71.711741] [<c16bd83b>] schedule+0x2b/0x90 [ 71.711743] [<c1069e73>] kthread+0x73/0xb0 [ 71.711744] [<c1060000>] ? SyS_olduname+0x100/0x180 [ 71.711746] [<c16c1081>] ret_from_kernel_thread+0x21/0x30 [ 71.711747] [<c1069e00>] ? kthread_worker_fn+0x160/0x160 [ 71.711748] ---[ end trace 0000000000000002 ]---
Also if you say that the v3.18.9 based RT tree worked could please try v3.18.13-rt10? If so then you could the git tree https://git.kernel.org/cgit/linux/kernel/git/rt/linux-stable-rt.git/ and start a bisect between v3.18.13-rt10 and v3.18.29-rt30?
Great, thx, we'll get started on this this week. Thanks again, -David
quoted
-Davidquoted
quoted
Thanks in advance, -DavidSebastian