Re: RAID6 dead on the water after Controller failure
From: Phil Turmel <hidden>
Date: 2014-02-15 15:12:49
Good morning Florian, On 02/15/2014 07:31 AM, Florian Lampel wrote:
Greetings, first of all - thanks to Phil Turmel for pointing me in the right direction. I checked all the cables and true enough, the System SSD's cable's shielding was halfway peeled off.
Very good.
quoted hunk ↗ jump to hunk
Anyway, the current state is as follows: *) The missing HDDs came up right after the reboot, and I had to use the "bootdegraded=true" kernel option. *) All 12 drives are functional. Here is a link to the requested output of--- mdadm -E /dev/sd[abcd]1 --- --- for x in /dev/sd[a-z] ; do echo $x : ; smartctl -x $x ; done ----as well as ---- mdadm --examine /dev/sd[abcdefghijklmnop]1 ------ Link: h__p://pastebin.com/v6yzn3KX
Device order has changed, summary: /dev/sda1: WD-WMC300595440 Device #4 @442 /dev/sdb1: WD-WMC300595880 Device #5 @442 /dev/sdc1: WD-WMC1T1521826 Device #6 @442 /dev/sdd1: WD-WMC300314126 spare /dev/sde1: WD-WMC300595645 Device #8 @435 /dev/sdf1: WD-WMC300314217 Device #9 @435 /dev/sdg1: WD-WMC300595957 Device #10 @435 /dev/sdh1: WD-WMC300313432 Device #11 @435 /dev/sdj1: WD-WMC300312702 Device #0 @442 /dev/sdk1: WD-WMC300248734 Device #1 @442 /dev/sdl1: WD-WMC300314248 Device #2 @442 /dev/sdm1: WD-WMC300585843 Device #3 @442 and your SSD is now /dev/sdi.
My findings: The Event count does differ, but not by much. As my next step, I would follow Phil Turmel's advice and reassemble the Array using the --force option, to be precise: mdadm -Afv /dev/md0 /dev/sd[abcdefgjklm]1
Not quite. What was 'h' is now 'd'. Use: mdadm -Afv /dev/md0 /dev/sd[abcefghjklm]1
Could you please advise me wether this next step is all right to do now that we have new logs etc.?
Yes. You may also need "mdadm --stop /dev/md0" first if your boot process partially assembled the array already. After assembly, your array will be single-degraded but fully functional. That would be a good time to backup any critical data that isn't already in a backup. Then you can add /dev/sdd1 back into the array and let it rebuild.
Thanks in advance, Florian Lampel PS: Thanks again to Phil for pointing out that --create would be madness.--
One more thing: your drives report never having a self-test run. You should have a cron job that triggers a long background self-test on a regular basis. Weekly, perhaps. Similarly, you should have a cron job trigger an occasional "check" scrub on the array, too. Not at the same time as the self-tests, though. (I understand some distributions have this already.) HTH, Phil