Re: [BUG] 2.6.24-rc3-git2 softlockup detected
From: Andrew Morton <akpm@linux-foundation.org>
Date: 2007-11-29 08:37:27
Also in:
linux-scsi, lkml
On Thu, 29 Nov 2007 12:01:08 +0530 Kamalesh Babulal [off-list ref] wrote:
Andrew Morton wrote:quoted
On Wed, 28 Nov 2007 12:47:19 +0530 Kamalesh Babulal [off-list ref] wrote:quoted
Andrew Morton wrote:quoted
On Wed, 28 Nov 2007 11:59:00 +0530 Kamalesh Babulal [off-list ref] wrote:quoted
Hi,(cc linux-scsi, for sym53c8xx)quoted
Soft lockup is detected while bootup with 2.6.24-rc3-git2 on powerboxI assume this is a post-2.6.23 regression?quoted
BUG: soft lockup - CPU#1 stuck for 11s! [insmod:375] NIP: c00000000002f02c LR: d0000000001414fc CTR: c00000000002f018 REGS: c00000077cbef0b0 TRAP: 0901 Not tainted (2.6.24-rc3-git2-autotest) MSR: 8000000000009032 <EE,ME,IR,DR> CR: 24022088 XER: 00000000 TASK = c00000077cbd8000[375] 'insmod' THREAD: c00000077cbec000 CPU: 1 GPR00: d0000000001414fc c00000077cbef330 c00000000052b930 d000080080002014 GPR04: d00008008000202c 0000000000000000 c00000077ca1cb00 d00000000014ce54 GPR08: c00000077ca1c63c 0000000000000000 000000000000002a c00000000002f018 GPR12: d000000000143610 c000000000473d00 NIP [c00000000002f02c] .ioread8+0x14/0x60 LR [d0000000001414fc] .sym_hcb_attach+0x1188/0x1378 [sym53c8xx] Call Trace: [c00000077cbef330] [c00000077cbef3c0] 0xc00000077cbef3c0 (unreliable) [c00000077cbef3a0] [d0000000001414fc] .sym_hcb_attach+0x1188/0x1378 [sym53c8xx] [c00000077cbef470] [d0000000001395f8] .sym2_probe+0x700/0x99c [sym53c8xx] [c00000077cbef710] [c0000000001bc118] .pci_device_probe+0x124/0x1b0 [c00000077cbef7b0] [c000000000221138] .driver_probe_device+0x144/0x20c [c00000077cbef850] [c000000000221450] .__driver_attach+0xcc/0x154 [c00000077cbef8e0] [c00000000021ff94] .bus_for_each_dev+0x7c/0xd4 [c00000077cbef9a0] [c000000000220e9c] .driver_attach+0x28/0x40 [c00000077cbefa20] [c0000000002204d8] .bus_add_driver+0x90/0x228 [c00000077cbefac0] [c000000000221858] .driver_register+0x94/0xb0 [c00000077cbefb40] [c0000000001bc430] .__pci_register_driver+0x6c/0xcc [c00000077cbefbe0] [d000000000143428] .sym2_init+0x108/0x15b0 [sym53c8xx] [c00000077cbefc80] [c00000000008ce80] .sys_init_module+0x17c4/0x1958 [c00000077cbefe30] [c00000000000872c] syscall_exit+0x0/0x40 Instruction dump: 60000000 786b0420 38210070 7d635b78 e8010010 7c0803a6 4e800020 7c0802a6 f8010010 f821ff91 7c0004ac 89230000 <0c090000> 4c00012c 79290620 2f8900ffI see no obvious lockup sites near the end of sym_hcb_attach(). Maybe it's being called lots of times from a higher level.. Do the traces all look the same?Hi Andrew, I see this call trace twice and both looks similar and on another reboot the following trace is seen twice in different cpu BUG: soft lockup detected on CPU#3! Call Trace: [C00000003FEDEDA0] [C000000000010220] .show_stack+0x68/0x1b0 (unreliable) [C00000003FEDEE40] [C0000000000A061C] .softlockup_tick+0xf0/0x13c [C00000003FEDEEF0] [C000000000072E2C] .run_local_timers+0x1c/0x30 [C00000003FEDEF70] [C000000000022FA0] .timer_interrupt+0xa8/0x488 [C00000003FEDF050] [C0000000000034EC] decrementer_common+0xec/0x100--- Exception: 901 at .ioread8+0x14/0x60 LR = .sym_hcb_attach+0x1194/0x1384 [sym53c8xx][C00000003FEDF340] [D0000000002B3BC0] 0xd0000000002b3bc0 (unreliable) [C00000003FEDF3B0] [D00000000029A3C0] .sym_hcb_attach+0x1194/0x1384 [sym53c8xx] [C00000003FEDF480] [D000000000291D30] .sym2_probe+0x75c/0x9f8 [sym53c8xx] [C00000003FEDF710] [C0000000001B65A4] .pci_device_probe+0x13c/0x1dc [C00000003FEDF7D0] [C000000000219A0C] .driver_probe_device+0xa0/0x15c [C00000003FEDF870] [C000000000219C64] .__driver_attach+0xb4/0x138 [C00000003FEDF900] [C00000000021913C] .bus_for_each_dev+0x7c/0xd4 [C00000003FEDF9C0] [C0000000002198B0] .driver_attach+0x28/0x40 [C00000003FEDFA40] [C000000000218BA4] .bus_add_driver+0x98/0x18c [C00000003FEDFAE0] [C00000000021A064] .driver_register+0xa8/0xc4 [C00000003FEDFB60] [C0000000001B68AC] .__pci_register_driver+0x5c/0xa4 [C00000003FEDFBF0] [D00000000029C204] .sym2_init+0x104/0x1550 [sym53c8xx] [C00000003FEDFC90] [C00000000008D1F4] .sys_init_module+0x1764/0x1998 [C00000003FEDFE30] [C00000000000869C] syscall_exit+0x0/0x40hm, odd. Can you look up sym_hcb_attach+0x1194/0x1384 in gdb? Something likeHi Andrew, I tried with 2.6.24-rc3-git3 and got the following trace BUG: soft lockup - CPU#2 stuck for 11s! [insmod:375] NIP: c00000000002f02c LR: d0000000001414fc CTR: c00000000002f018 REGS: c00000077ca3b0b0 TRAP: 0901 Not tainted (2.6.24-rc3-git3-autokern1) MSR: 8000000000009032 <EE,ME,IR,DR> CR: 24022088 XER: 00000000 TASK = c00000077cc58000[375] 'insmod' THREAD: c00000077ca38000 CPU: 2 GPR00: d0000000001414fc c00000077ca3b330 c00000000052b880 d000080080002014 GPR04: d00008008000202c 0000000000000000 c00000077c82eb00 d00000000014ce54 GPR08: c00000077c82e63c 0000000000000000 000000000000002a c00000000002f018 GPR12: d000000000143610 c000000000473f80 NIP [c00000000002f02c] .ioread8+0x14/0x60 LR [d0000000001414fc] .sym_hcb_attach+0x1188/0x1378 [sym53c8xx] Call Trace: [c00000077ca3b330] [c00000077ca3b3c0] 0xc00000077ca3b3c0 (unreliable) [c00000077ca3b3a0] [d0000000001414fc] .sym_hcb_attach+0x1188/0x1378 [sym53c8xx] [c00000077ca3b470] [d0000000001395f8] .sym2_probe+0x700/0x99c [sym53c8xx] [c00000077ca3b710] [c0000000001bc098] .pci_device_probe+0x124/0x1b0 [c00000077ca3b7b0] [c0000000002210c4] .driver_probe_device+0x144/0x20c [c00000077ca3b850] [c0000000002213dc] .__driver_attach+0xcc/0x154 [c00000077ca3b8e0] [c00000000021ff20] .bus_for_each_dev+0x7c/0xd4 [c00000077ca3b9a0] [c000000000220e28] .driver_attach+0x28/0x40 [c00000077ca3ba20] [c000000000220464] .bus_add_driver+0x90/0x228 [c00000077ca3bac0] [c0000000002217e4] .driver_register+0x94/0xb0 [c00000077ca3bb40] [c0000000001bc3b0] .__pci_register_driver+0x6c/0xcc [c00000077ca3bbe0] [d000000000143428] .sym2_init+0x108/0x15b0 [sym53c8xx] [c00000077ca3bc80] [c00000000008ce80] .sys_init_module+0x17c4/0x1958 [c00000077ca3be30] [c00000000000872c] syscall_exit+0x0/0x40 Instruction dump: 60000000 786b0420 38210070 7d635b78 e8010010 7c0803a6 4e800020 7c0802a6 f8010010 f821ff91 7c0004ac 89230000 <0c090000> 4c00012c 79290620 2f8900ff The gdb list the following for the above trace 0xa4fc is in sym_hcb_attach (drivers/scsi/sym53c8xx_2/sym_hipd.c:1041). 1036 OUTL_DSP(np, pc); 1037 /* 1038 * Wait 'til done (with timeout) 1039 */ 1040 for (i=0; i<SYM_SNOOP_TIMEOUT; i++) 1041 if (INB(np, nc_istat) & (INTF|SIP|DIP)) 1042 break; 1043 if (i>=SYM_SNOOP_TIMEOUT) { 1044 printf ("CACHE TEST FAILED: timeout.\n"); 1045 return (0x20);
doh, I missed that. #define SYM_SNOOP_TIMEOUT (10000000) ten million is close enough to infinity for me to assume that we broke the driver and that's never going to terminate. otoh, if that's true you should have got the "CACHE TEST FAILED: timeout" message. Did you? And does the driver actually work OK after this? If it is indeed expected that a ~10 second stall in there is correct behaviour then all we need to do is do make that loop a bit smarter (10,000 msleep(1)'s, for example).