Re: Raid corruption problems.
From: John McMonagle <hidden>
Date: 2006-12-24 14:00:42
Bill Davidsen wrote:
John McMonagle wrote:quoted
Have a raid1 backup server that seems to get corrupted. This is the 3rd time in about a year. Have 2 other backup servers that were cloned from this one that have no problems. Done a couple kernel upgrades recently. Now has 2.6.18-2 kernel. It's based on Debian sarge. It's a low end Intel server motherboard using ata_piix sata driver. Have another mother board just like doing raid1 with sata drives that has had no problems but it has a much lighter disk load. smartctl has never shown any problems. In /sys/block/md2 did echo check > syncaction No error messages but mismatch_cnt is 1152. rc0/errors and rc1/errors are both 0. I'm guessing a hardware problem. Any suggestions?Since memory is the easiest to test, I'd try memtest86+ for at least 12 hr. If this were PATA I'd suggest replugging the cables, but it's lower probability with SATA. Still, probably worth trying.
Ran Memtest86+ for over 18 hours with no errors. Also have ecc ram. I can look at the cables next time I'm there. Doesn't sata do some sort of error checking over the cables? Anything else to try? Memtest86+ v1.65 | Pass 74% ############################ Pentium 4 (0.09) 2793 MHz | Test 61% ####################### L1 Cache: 16K 17135MB/s | Test #7 [Random number sequence] L2 Cache: 1024K 15179MB/s | Testing: 112K - 1024M 1024M Memory : 1024M 2059MB/s | Pattern: 189170f4 Chipset : Intel i875P (ECC : Detect / Correct) - PAT : Enabled Settings: RAM : 199 MHz (DDR398) / CAS : 3-3-3-8 / Dual Channel (128 bits) WallTime Cached RsvdMem MemMap Cache ECC Test Pass Errors ECC Errs --------- ------ ------- -------- ----- --- ---- ---- ------ -------- 18:52:27 1024M 120K e820-Std on off Std 56 0 ----------------------------------------------------------------------------- -- John McMonagle IT Manager Advocap Inc.