Thread (9 messages) 9 messages, 8 authors, 2008-07-15

Re: How to avoid complete rebuild of RAID 6 array (6/8 active devices)

From: Matthias Urlichs <hidden>
Date: 2008-07-14 22:58:16

Hi,

David Greaves:
I've found that once a disk starts to go bad there is a very strong
tendency for it to continue to deteriorate.
In my experience, that's true for older disks, but not necessarily for
those that are new and simply have a spot or two where the magnetizable
layer is a wee bit too thin.

However, even if they do in fact continue to deteriorate, the ability to
re-map the offending areas and continue gives me an order of magnitude
more time to deal with the problem.

In fact, as I said, there may be problems lurking on other disks which I
just haven't found yet (how often do you read all 5TB of your data?),
which means that a feature like this is the difference between being
able to recover and certain data loss, RAID-6 nonwithstanding.


NB, one other problem I've observed (older kernel, I don't know if it's
been fixed) is that a resync is restarted from the beginning when a
fault on a second disk is encountered. BAD idea.


NB2, my ideal RAID recovery scenario looks like this:
* When a disk access fails, the offender is switched to write-only mode.
  I.e., the kernel ignores it when reading, but still tries to write
  correct data when something's updated.
* In order to re-sync a new disk, simply duplicate the old one if it
  hasn't been removed yet; of course, you need to do "real" recovery for
  the bad spots, and you need the aforementioned write-only code to
  update both (when writing to the area that's already synced up).

The _huge_ advantage of this process would be that a re-sync does not
affect the array's read performance at all (other than the higher CPU
usage). For some people, that can be quite important.

Now where can I get the largish chunk of time required to implement all
of this ... oh well.

-- 
Matthias Urlichs   |   {M:U} IT Design @ m-u-it.de   |  smurf@smurf.noris.de
Disclaimer: The quote was selected randomly. Really. | http://smurf.noris.de
 - -
The way to a man's heart is through the left ventricle.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help