Thread (5 messages) 5 messages, 2 authors, 2014-11-18

Re: Raid 6 Fail Event

From: Chris Murphy <hidden>
Date: 2014-11-16 19:52:02

On Nov 16, 2014, at 8:39 AM, Justin Stephenson [off-list ref] wrote:
Hello,

I am new to MDADM and have just experienced my first device fail on my raid 6.

I am wondering if someone might be able to help by outlining a proper protocol for troubleshooting and rebuilding this array (proc/mdstat below).

Here is how I might approach it:

- remove the device
- test the device
- if the device tests OK then re add the device
- if the device fails, then replace the device
- resync

Thank-you for your consideration.

Best,

- Justin

Here is the mdstat email

-----------------

This is an automatically generated mail message from mdadm
running on BigBlue

A Fail event had been detected on md device /dev/md0.

It could be related to component device /dev/sdh1.
First step is getting the backup current. 

Second you can do this without removing the device:

# smartctl -x /dev/sdh

And then look in dmesg for errors related to its ata designation. You should be able to get a serial number from the smartctl output and can search that with dmesg | grep <serial#> to find out what it’s ata designation (port and device number) is, then you can dmesg | grep ataX.YY to get any read/write error events that explain what’s going on. 

While you’re at it the following would be helpful as well:

# smartctl -l scterc /dev/sdh
# cat /sys/block/sdh/device/state
# cat /sys/block/sdh/device/timeout

These are read-only commands to determine states, they don’t change states so it’s safe.

Chris Murphy--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help