Re: Fw: Why does one get mismatches?
From: Jon Hardcastle <hidden>
Date: 2010-01-25 10:07:11
--- On Sun, 24/1/10, Goswin von Brederlow <goswin-v-b@web.de> wrote:
From: Goswin von Brederlow <redacted> Subject: Re: Fw: Why does one get mismatches? To: Jon@eHardcastle.com Cc: "Goswin von Brederlow" <redacted>, linux-raid@vger.kernel.org Date: Sunday, 24 January, 2010, 23:13 Jon Hardcastle [off-list ref] writes:quoted
--- On Fri, 22/1/10, Goswin von Brederlow <goswin-v-b@web.de>wrote:quoted
quoted
From: Goswin von Brederlow <redacted> Subject: Re: Fw: Why does one get mismatches? To: Jon@eHardcastle.com Cc: linux-raid@vger.kernel.org Date: Friday, 22 January, 2010, 18:13 Jon Hardcastle [off-list ref] writes:quoted
--- On Tue, 19/1/10, Jon Hardcastle <jd_hardcastle@yahoo.com>wrote:quoted
quoted
From: Jon Hardcastle <redacted> Subject: Why does one get mismatches? To: linux-raid@vger.kernel.org Date: Tuesday, 19 January, 2010, 10:04 Hi, I kicked off a check/repair cycle on mymachinequoted
quoted
after iquoted
quoted
moved the phyiscal ordering of my drivesaroundquoted
quoted
and I am nowquoted
quoted
on my second check/repair cycle and ithas keptquoted
quoted
findingquoted
quoted
mismatches. Is it correct that the mismatch valueafter aquoted
quoted
repair wasquoted
quoted
needed should equal the value presentafter aquoted
quoted
check? What ifquoted
quoted
it doesn't? What does it mean if anothercheckquoted
quoted
STILL revealsquoted
quoted
mismatches? I had something similar after i reshapedfrom raidquoted
quoted
5 to 6 iquoted
quoted
had to run check/repair/check/repairseveral timesquoted
quoted
before iquoted
quoted
got my 0.Guys, Anyone got any suggestions here? I am now onmy ~5quoted
quoted
check/repair and after a reboot the first check isstillquoted
quoted
returning 8.quoted
All i have done is move the drives around. Itis thequoted
quoted
same controllers/cables/etcquoted
I really dont like the seeming random natureof whatquoted
quoted
can/does/has caused the mismatches? There is some unknown corruption going on withraid1 thatquoted
quoted
causes mismatches but it is believed that it will neveroccur onquoted
quoted
any used block. Swapping is a likely cause. Any swap device on the raid? Try turning thatoff.quoted
quoted
If that doesn't help try umounting filesystems or remounting RO. MfG GoswinHello, my usual savior Goswin! The deal is it is a 7 drive raid 6 array. it has LVMon it and is not used for swapping. I have umounted all LV's and still got mismatches, i run smartctl --test=long on all drives - nothing. I have now dismantled the array and am 3/4 the way through 'badblocks -svn' on each of the component drive. I have a hunch that it may be a dodgy SATA cable but have no evidence. No errors in log, nothing on dmesg.quoted
Is there any way to get more information? I amstarting to think this is more happened since i changed from raid 5 to 6..... which i did < 1 month ago.quoted
The only lead i have is that whilst doing the badblocks 1 drive ran at ~10~15MB/s whereas the rest are going at ~30 i have another identical model drive coming up so i will see if that one is slow too. But the lack of logging info is not helpful and worrying! and the prospect of silent corruption a big worry! You did run a repair pass and not just repeated check passes, right? Check itself only counts the mismatches but does not correct them. If the raid is unused (vgchange -a n) and you do first repair and then check then that definetly should not find any mismatches. MfG Goswin
Hello!
Yes, I have a simple script that first does a check, then if there are mismatches it does repair. I have then been manually rerunning a check and I keep getting mismatches. I goes like this 232, 8, 24, 8, 8, 16, 16, 24, 24, 8, 16, 24. But I have also done this manually and run several repairs in a row (assuming that will return 0 if no work is to be done)
Now the array is completely dismantled and I am running bad blocks on the drives but I am on the last 2 of the 7 drives and I still have no leads. No bad blocks, no offline uncorrectable, no pending sectors no dmesg errors no nothing. I have absolutely no leads what so ever.
The only thing i have left to try is a full Mem test and disconnect and reseat the additional sata controllers, oh and buy 7 new sata cables incase 1 is bad.
But it would be REALLY helpful to know on what drive the mismatches have occured.
Any help here would be gratefully received! I might even try converting the array back to raid 5 as i remember i had mismatches immediately after i converted from 5 to 6.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html