Re: Raid 6 Fail Event
From: Chris Murphy <hidden>
Date: 2014-11-16 19:52:02
On Nov 16, 2014, at 8:39 AM, Justin Stephenson [off-list ref] wrote:
Hello, I am new to MDADM and have just experienced my first device fail on my raid 6. I am wondering if someone might be able to help by outlining a proper protocol for troubleshooting and rebuilding this array (proc/mdstat below). Here is how I might approach it: - remove the device - test the device - if the device tests OK then re add the device - if the device fails, then replace the device - resync Thank-you for your consideration. Best, - Justin Here is the mdstat email ----------------- This is an automatically generated mail message from mdadm running on BigBlue A Fail event had been detected on md device /dev/md0. It could be related to component device /dev/sdh1.
First step is getting the backup current. Second you can do this without removing the device: # smartctl -x /dev/sdh And then look in dmesg for errors related to its ata designation. You should be able to get a serial number from the smartctl output and can search that with dmesg | grep <serial#> to find out what it’s ata designation (port and device number) is, then you can dmesg | grep ataX.YY to get any read/write error events that explain what’s going on. While you’re at it the following would be helpful as well: # smartctl -l scterc /dev/sdh # cat /sys/block/sdh/device/state # cat /sys/block/sdh/device/timeout These are read-only commands to determine states, they don’t change states so it’s safe. Chris Murphy-- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html