Re: How to avoid complete rebuild of RAID 6 array (6/8 active devices)

(off-list ancestor, not in this archive)
Re: How to avoid complete rebuild of RAID 6 array (6/8 active devices) · Neil Brown <hidden> · 2008-06-27
Re: How to avoid complete rebuild of RAID 6 array (6/8 active devices) · Bill Davidsen <hidden> · 2008-06-29
Re: How to avoid complete rebuild of RAID 6 array (6/8 active devices) · Matthias Urlichs <hidden> · 2008-07-14
Re: How to avoid complete rebuild of RAID 6 array (6/8 active devices) · David Greaves <hidden> · 2008-07-14
RE: How to avoid complete rebuild of RAID 6 array (6/8 active devices) · David Lethe <hidden> · 2008-07-14
Re: How to avoid complete rebuild of RAID 6 array (6/8 active devices) · Matthias Urlichs <hidden> · 2008-07-14
Re: How to avoid complete rebuild of RAID 6 array (6/8 active devices) · Richard Scobie <hidden> · 2008-07-14
Re: How to avoid complete rebuild of RAID 6 array (6/8 active devices) · Matthias Urlichs <hidden> · 2008-07-15
Re: How to avoid complete rebuild of RAID 6 array (6/8 active devices) · Keld Jørn Simonsen <hidden> · 2008-07-15

From: Keld Jørn Simonsen <hidden>
Date: 2008-07-15 14:24:50

On Tue, Jul 15, 2008 at 12:58:16AM +0200, Matthias Urlichs wrote:

Hi,

However, even if they do in fact continue to deteriorate, the ability to
re-map the offending areas and continue gives me an order of magnitude
more time to deal with the problem.

In fact, as I said, there may be problems lurking on other disks which I
just haven't found yet (how often do you read all 5TB of your data?),
which means that a feature like this is the difference between being
able to recover and certain data loss, RAID-6 nonwithstanding.

One idea about this - One could read and write the disks perodically,
say once a month. In this way single bit errors that could have evolved
on the disks coule be repaired, as the CRC saves the one bit error, and 
gets it corrected when writing. For a raid - if an error occurs, then
the sound data could be used, and if the error persists after a rewrite
on the bad disk, that data should then be remapped to a sound area on
the drive. Maybe people already have implemented this. SMART data could
also be consulted. 

I thought of badblocks -n to do this, but also raid check could be a
place to do it. When writing ons should of cause take care that nobody
else is writing the same data.  

best regards
keld

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help