Thread (6 messages) 6 messages, 3 authors, 2014-11-28

Re: mdadm raid5 single drive fail, single drive out of sync terror

From: Phil Turmel <hidden>
Date: 2014-11-26 15:47:06

Good morning Jon,

On 11/26/2014 10:08 AM, Jon Robison wrote:
Hi all!

I upgraded to mdadm-3.3-7.fc20.x86_64, and my raid5 array would no
longer recognize /dev/sdb1 in my raid 5 array (which is normally
/dev/sd[b-f]1). I `mdadm --detail --scan`,  which resulted in a degraded
array, then added /dev/sdb1, and it started rebuilding happily until 25%
or so, when another failure seemed to occur.
Well, failures during rebuild of a raid5 are common.  In my experience,
including helping on this list, most often due to timeout mismatch and a
failure to regularly scrub.
I am convinced the data is fine on /dev/sd[c-f]1, and that somehow I
just need to inform mdadm about that, but they got out of sync and
/dev/sde1 thinks the array is AAAAA while the others think its AAA.. .
The drives also seem to think e is bad because f said e was bad or some
weird stuff, and sde1 is behind by ~50 events or so. That error hasn't
shown itself recently. I fear sdb is bad and sde is going to go soon.
Please show your dmesg from the start of the problem.  Also show
"smartctl -x /dev/sdX" for each of the member devices.  Also show an
excerpt from "ls -l /dev/disk/by-id/" that shows the device vs. serial
number relationship for your drives.
Results of `mdadm --examine /dev/sd[b-f]1` are here
http://dpaste.com/2Z7CPVY
Just put the results in the email in the future.  Kernel.org tolerates
relatively large messages.
I'm scared and alone. Everything is off and sitting as above, though e
50 events behind and out of synch. New drives coming Friday and backup
is of course a bit old. I'm petrified to execute `mdadm --create
--assume-clean --level=5 --raid-devices=5 /dev/md0 /dev/sdf1 /dev/sdd1
/dev/sdc1 /dev/sde1 missing`,
You should be petrified of any '--create' operation.  What you've shown
above would certainly *not* work, thanks to your data offsets.
but that seems my next option unless ya'll
know better. I tried `mdadm --assemble -f /dev/md0 /dev/sdf1 /dev/sdd1
/dev/sdc1 /dev/sde1` and it said something like can't start with only 3
devices (which I wouldn't expect because examine still shows 4, just
that they are out of sync and I thought that was -f's express purpose in
assemble mode). Anyone have any suggestions? Thanks!
Show the contents of /proc/mdstat, then show the results of:

mdadm --stop /dev/md0
mdadm --assemble --force --verbose /dev/md0 /dev/sd[cdef]1

Phil
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help