Re: self healing of MD raid
From: Robin Hill <hidden>
Date: 2015-06-02 19:14:06
On Tue Jun 02, 2015 at 01:01:31PM -0500, Alireza Haghdoost wrote:
On Tue, Jun 2, 2015 at 12:53 PM, Robin Hill [off-list ref] wrote:quoted
On Tue Jun 02, 2015 at 07:22:36PM +0200, keld@keldix.com wrote:quoted
Hi list I wonder if MD RAID software is kind of self healing. That is, if a read operation gets an IO error, then the logical sector of the RAID can be recreated from the other sector(s) of the raid, and then written out on the block which gave a read error. His could work both for the mirrored RAID types, and for the parity orientet RAID types. Is that implemented in MD RAID? Similarily the self healing process could be part of the monitoring background processes. Best regaqrds keldYes, this is implemented as standard for all forms of RAID with redundant data (parity/mirror). A read error will automatically trigger a rewrite of the faulty block with data recovered from the other members. This rewrite should also trigger a remapping within the drive if the original block proves to be unwritable as well. Running a regular check (echo check > /sys/block/mdX/md/sync_action) will do a full read of all active members in an array and therefore trigger rewrites for any unreadable blocks. This is often set up as part of the standard distro cron jobs, but should be set up manually if not.Do you know what would be the MD action if it cannot recover the faulty block from the other members ? Assuming not enough members are online, does it just print a warning in the dmesg ? Does any one in the MD layer keep track of the number of corruption events like this ? --Alireza
If the faulty block cannot be rebuilt from the other members then a read
error is passed on to the application and the array keeps running (the
same way a normal block device would handle a read error).
If you have a bad block log on the array member (a relatively new
feature) then it will record that the block is invalid. Otherwise I
don't think there's any tracking within the md layer - you'd need to
fall back on whatever tracking there is on the underlying block device
(i.e. SMART data, etc.).
Cheers,
Robin
--
___
( ' } | Robin Hill [off-list ref] |
/ / ) | Little Jim says .... |
// !! | "He fallen in de water !!" | Attachments
- signature.asc [application/pgp-signature] 181 bytes