Thread (2 messages) 2 messages, 2 authors, 2010-01-25

Re: Fw: Why does one get mismatches?

From: Jon Hardcastle <hidden>
Date: 2010-01-25 10:07:11

--- On Sun, 24/1/10, Goswin von Brederlow <goswin-v-b@web.de> wrote:
From: Goswin von Brederlow <redacted>
Subject: Re: Fw: Why does one get mismatches?
To: Jon@eHardcastle.com
Cc: "Goswin von Brederlow" <redacted>, linux-raid@vger.kernel.org
Date: Sunday, 24 January, 2010, 23:13
Jon Hardcastle [off-list ref]
writes:
quoted
--- On Fri, 22/1/10, Goswin von Brederlow <goswin-v-b@web.de>
wrote:
quoted
quoted
From: Goswin von Brederlow <redacted>
Subject: Re: Fw: Why does one get mismatches?
To: Jon@eHardcastle.com
Cc: linux-raid@vger.kernel.org
Date: Friday, 22 January, 2010, 18:13
Jon Hardcastle [off-list ref]
writes:
quoted
--- On Tue, 19/1/10, Jon Hardcastle <jd_hardcastle@yahoo.com>
wrote:
quoted
quoted
From: Jon Hardcastle <redacted>
Subject: Why does one get mismatches?
To: linux-raid@vger.kernel.org
Date: Tuesday, 19 January, 2010, 10:04
Hi,

I kicked off a check/repair cycle on my
machine
quoted
quoted
after i
quoted
quoted
moved the phyiscal ordering of my drives
around
quoted
quoted
and I am now
quoted
quoted
on my second check/repair cycle and it
has kept
quoted
quoted
finding
quoted
quoted
mismatches.

Is it correct that the mismatch value
after a
quoted
quoted
repair was
quoted
quoted
needed should equal the value present
after a
quoted
quoted
check? What if
quoted
quoted
it doesn't? What does it mean if another
check
quoted
quoted
STILL reveals
quoted
quoted
mismatches?

I had something similar after i reshaped
from raid
quoted
quoted
5 to 6 i
quoted
quoted
had to run check/repair/check/repair
several times
quoted
quoted
before i
quoted
quoted
got my 0.
Guys,

Anyone got any suggestions here? I am now on
my ~5
quoted
quoted
check/repair and after a reboot the first check is
still
quoted
quoted
returning 8.
quoted
All i have done is move the drives around. It
is the
quoted
quoted
same controllers/cables/etc 
quoted
I really dont like the seeming random nature
of what
quoted
quoted
can/does/has caused the mismatches?

There is some unknown corruption going on with
raid1 that
quoted
quoted
causes
mismatches but it is believed that it will never
occur on
quoted
quoted
any used
block. Swapping is a likely cause.

Any swap device on the raid? Try turning that
off.
quoted
quoted
If that doesn't help try umounting filesystems or
remounting RO.

MfG
        Goswin
Hello, my usual savior Goswin!

The deal is it is a 7 drive raid 6 array. it has LVM
on it and is not used for swapping. I have umounted all LV's
and still got mismatches, i run smartctl --test=long on all
drives - nothing. I have now dismantled the array and am 3/4
the way through 'badblocks -svn' on each of the component
drive. I have a hunch that it may be a dodgy SATA cable but
have no evidence. No errors in log, nothing on dmesg.
quoted
Is there any way to get more information? I am
starting to think this is more happened since i changed from
raid 5 to 6..... which i did < 1 month ago.
quoted
The only lead i have is that whilst doing the bad
blocks 1 drive ran at ~10~15MB/s whereas the rest are going
at ~30 i have another identical model drive coming up so i
will see if that one is slow too. But the lack of logging
info is not helpful and worrying! and the prospect of silent
corruption a big worry!

You did run a repair pass and not just repeated check
passes, right?
Check itself only counts the mismatches but does not
correct them.
If the raid is unused (vgchange -a n) and you do first
repair and then
check then that definetly should not find any mismatches.

MfG

        Goswin
Hello!

Yes, I have a simple script that first does a check, then if there are mismatches it does repair. I have then been manually rerunning a check and I keep getting mismatches. I goes like this 232, 8, 24, 8, 8, 16, 16, 24, 24, 8, 16, 24. But I have also done this manually and run several repairs in a row (assuming that will return 0 if no work is to be done)

Now the array is completely dismantled and I am running bad blocks on the drives but I am on the last 2 of the 7 drives and I still have no leads. No bad blocks, no offline uncorrectable, no pending sectors no dmesg errors no nothing. I have absolutely no leads what so ever.

The only thing i have left to try is a full Mem test and disconnect and reseat the additional sata controllers, oh and buy 7 new sata cables incase 1 is bad.

But it would be REALLY helpful to know on what drive the mismatches have occured.

Any help here would be gratefully received! I might even try converting the array back to raid 5 as i remember i had mismatches immediately after i converted from 5 to 6.


      
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help