Re: IRQ issues with multiple SiI3114's on Kernel 3.2
From: Stirling Westrup <hidden>
Date: 2012-07-28 23:41:44
On Sat, Jul 28, 2012 at 1:48 PM, Stirling Westrup [off-list ref] wrote:
On Sat, Jul 28, 2012 at 5:10 AM, Stan Hoeppner [off-list ref] wrote:quoted
On 7/27/2012 9:20 PM, Stirling Westrup wrote:quoted
On Fri, Jul 27, 2012 at 6:14 PM, Stirling Westrup [off-list ref] wrote:quoted
On Fri, Jul 27, 2012 at 1:24 PM, Stan Hoeppner [off-list ref] wrote:quoted
On 7/27/2012 11:40 AM, Stirling Westrup wrote:quoted
I recently purchased a large system for use as a backup server for a pair of small businesses. It contains a boot drive plus 10 more storage drives. Despite having three onboard SATA controllers, the motherboard didn't have enough SATA connectors for all the drives, so I installed a pair of identical SiI3114 raid cards to handle the extra connections. It has a Sandy Bridge chipset, so I installed a 3.2 kernel. # uname -a Linux ttt 3.2.0-0.bpo.2-amd64 #1 SMP Fri Jun 29 20:42:29 UTC 2012 x86_64 GNU/Linux...quoted
Okay, enough background. Here's the issue: I had no trouble building and sync'ing the first array, but when I try to sync the second array, I always get the following dmesg an hour or so into the process: irq 19: nobody cared (try booting with the "irqpoll" option) [ 346.120572] Pid: 1100, comm: md1_resync Not tainted3.2.0-0.bpo.2-amd64 #1quoted
[ 346.120573] Call Trace: ... [ 346.120697] handlers: [ 346.120699] [<ffffffffa00479e0>] ahci_interrupt [ 346.120702] [<ffffffffa02f17ec>] sil_interrupt [ 346.120703] Disabling IRQ #19 [ 346.122145] sched: RT throttling activated...quoted
From this point onward syncing drops to a tiny fraction of its previous speed. I've tried booting with 'irqpoll' as the error message suggests, but it has had no effect. I'm really not sure if there is a conflict between my two SiI3114's or between the SiI's and the Marvell controller (although I've never had an issue with Marvell in the past), nor how to go about diagnosing or fixing this. I'll include a full dmesg dump below, as well as my currently loaded modules. If anyone wants any further info, just ask.Have you tried irqbalance to spread the interrupts across cores/cache domains? https://irqbalance.org/documentation.htmlThanks for the tip! I installed irqbalance and rebooted the system, and everything has been running smoothly for the last two hours. I'll let everyone know tomorrow if it actually finished the full 20-hour resync without incidence.quoted
Alas, all it did was delay the IRQ error by a few hours. Does anyone else have any ideas about how I could tackle this?Try irqpoll and irqbalance together. Also, which motherboard is this, exact make/model please. May be a BIOS issue. Doesn't seem to be using MSIs. If the mobo and cards all support MSIs, enabling that may fix this as well. Also what make/model are the SiI3114 cards? PCIe or PCI?The motherboard is an Asus P8768-V Pro/Gen3 and has full MSI support, but the cards are labeled "Syba SiI 3114 PCI to 4 Port Sata 150" and don't support MSI.quoted
Have you tried different slot combinations? Moving one card to a different slot may get it routed to PCI INTB instead of INTA. That may get it mapped to something other than IRQ#19. Updating the 3114 boards to their latest firmware is worth a shot, if not there already.At one point the system was mapping the interrupt to IRQ#17, instead of 19, but it still failed. I haven't tried moving the cards to different slots or anything but, IIRC, it only has two PCI slots. I also have yet to try upgrading the BIOS of the mobo or updating the card firmware. I was hoping to have a better idea of what was going wrong before going down that route. I also wonder if it the problem could be kernel or libata related. (which is why I'm asking in this forum).
Okay, it looks like its a known hardware chipset problem, and was first reported 6-months ago. It affects all PCI cards in Asus Sandy-Bridge Motherboards. No known fix as of yet. https://lkml.org/lkml/2012/1/30/216