Thread (4 messages) 4 messages, 3 authors, 2015-06-02

Re: self healing of MD raid

From: Robin Hill <hidden>
Date: 2015-06-02 19:14:06

On Tue Jun 02, 2015 at 01:01:31PM -0500, Alireza Haghdoost wrote:
On Tue, Jun 2, 2015 at 12:53 PM, Robin Hill [off-list ref] wrote:
quoted
On Tue Jun 02, 2015 at 07:22:36PM +0200, keld@keldix.com wrote:
quoted
Hi list

I wonder if MD RAID software is kind of self healing.
That is, if a read operation gets an IO error, then the logical
sector of the RAID can be recreated from the other sector(s)
of the raid, and then written out on the block which gave a read error.

His could work both for the mirrored RAID types, and for the
parity orientet RAID types.

Is that implemented in MD RAID?

Similarily the self healing process could be part of the monitoring
background processes.

Best regaqrds
keld
Yes, this is implemented as standard for all forms of RAID with
redundant data (parity/mirror). A read error will automatically trigger
a rewrite of the faulty block with data recovered from the other
members. This rewrite should also trigger a remapping within the drive
if the original block proves to be unwritable as well.

Running a regular check (echo check > /sys/block/mdX/md/sync_action)
will do a full read of all active members in an array and therefore
trigger rewrites for any unreadable blocks. This is often set up as part
of the standard distro cron jobs, but should be set up manually if not.
Do you know what would be the MD action if it cannot recover the
faulty block from the other members ? Assuming not enough members are
online, does it just print a warning in the dmesg ? Does any one in
the MD layer keep track of the number of corruption events like this ?

--Alireza
If the faulty block cannot be rebuilt from the other members then a read
error is passed on to the application and the array keeps running (the
same way a normal block device would handle a read error).

If you have a bad block log on the array member (a relatively new
feature) then it will record that the block is invalid. Otherwise I
don't think there's any tracking within the md layer - you'd need to
fall back on whatever tracking there is on the underlying block device
(i.e. SMART data, etc.).

Cheers,
    Robin
-- 
     ___        
    ( ' }     |       Robin Hill        [off-list ref] |
   / / )      | Little Jim says ....                            |
  // !!       |      "He fallen in de water !!"                 |

Attachments

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help