Re: System hangs on raid md recovery/resync - revisit
From: Justin Piszcz <hidden>
Date: 2009-02-28 12:04:17
On Sat, 28 Feb 2009, Brad wrote:
On Sat, Feb 28, 2009 at 7:08 PM, Justin Piszcz [off-list ref] wrote:quoted
On Sat, 28 Feb 2009, Brad wrote:quoted
Hi. I'd like to revisit a problem I put to the mailing list on the 27th July 2008. My linux system hangs if I have a lengthy recovery of a raid-1 device going on at the same time as any significant network traffic. ...I have the same mobo: Handle 0x0001, DMI type 1, 27 bytes System Information Manufacturer: Gigabyte Technology Co., Ltd. Product Name: P35-DS4How did you get that information, please? Another linux command for me to learn?! :-)
dmidecode | more
quoted
Have a RAID1 and RAID5, I do not use the jmicron SATA ports, only the intel ones and add-on pci-e cards, never had any problems with the raid volumes. The NIC is sort of flaky though [in linux], I recommend using an intel pci-e 1gbps card.I've had another problem with the Realtek network driver ... under network load it seemed to miss interrupts and/or pass them to the IDE driver, which would print out errors about unexpected/unknown interrupts. I had to take IDE out of my kernel.
Correct, buy an Intel 1GBPS PCI-e card, I do for all of my main machines that do not have Intel NICs, solves the problem. They are $30-40 and then all of your network issues will be solved.
I *think* my current hanging problem was even worse when the pata_jmicron driver module - which I need to use the ATA DVD drive connected to one of the JMicron's IDE ports - shared the same interrupt as the Realtek driver.
Hm, no, I also use this jmicron driver and have no problems, but I no longer use the realtek nic. I will offer a piece of advice though, the timings on Gigabyte boards in general for the RAM, etc, have to be set just right otherwise, weird things happen, I have seen the motherboard freeze/lockup do weird things before, mainly before I had the memory settings set correctly. Run memtest86 and let it run for at least 1-2 passes, ENSURE you have no errors, if you have errors, then the memory timings/parameters are set incorrectly. This can cause system instability, even though the memory is not bad, you will still get errors because of the timing/multipliers etc! (I tested the RAM in another machine, no errors, move to gigabyte board with default settings, memory errors, and hence, system instability!)
I couldn't find a way to change interrupts (can one do that at will with the Linux kernel?) so my backup script unloads the pata_jmicron module before it attaches the third backup disk to the md1 array.
I do not use modules hardly ever, I do not understand why people do, at least for their main os/system drivers. For cameras, usb devices, etc, I can see how that would be useful, but for me, I compile everything in when possible, and only what is necessary.
But it still hangs if there's any significant network traffic. Maybe, even though I've gotten rid of anything using the same IRQ as the Realtek - IDE or pata_jmicron - the NIC driver is still flubbing interrupts and that's confusing the kernel?
How often do you the CD/DVD drive? There are SATA drives for $20-30 at newegg if you think the IDE/jmicron is the culprit to most of your problems.
Thanks for the advice Justin. Maybe the solution is to abandon use of the Realtek NIC (a pity to 'waste' what's freely available on the motherboard, though, in a way).
No problem, suggestions: 1. Run memtest86, ensure no errors after 1-2 passes. 2. Buy intel pci-e nic, ~$30 3. Buy sata dvd+rw, ~$20 Justin.