Thread (20 messages) 20 messages, 4 authors, 2013-02-10

Re: RAID5 with 2 drive failure at the same time

From: Christoph Nelles <hidden>
Date: 2013-02-03 15:56:35

Hi folks,

the dd_rescue to the new HDD took 14hours. It looks like ddrescue is not
reading and writing in parallel. In the end 8kb couldn't be read after
10 retries.

I just force-assembled the RAID with the new drive, but it failed almost
immediately with an WRITE FPDMA QUEUED error on one of the other drives
(sdj, formerly sdi). I tried immediately again, an this time one disk
was rejected but the RAID started on 8 devices, but xfs_repair failed
when one of the disks failed with an READ FPDMA QUEUED error :( and md
expelled the disk from the RAID.



It looks more like a controller problem as all the messages comming from
the drives on the PCIe Marvell have all the line
ataXX: illegal qc_active transition (00000002->00000003)
I found only one similar report about that problem:
http://marc.info/?l=linux-ide&m=131475722021117

Any recommendations for a decent and affordable SATA Controller with at
least 4 ports and faster than PCIe x1? Looks like there are only
Marvells and more expensive Enterprise RAID controllers.



Currently the RAID is running clean, but degraded. The filesystem is
mounted ro and looks healthy. I attached a mdadm --detail and put the
kernel logs since yesterday at
http://evilazrael.net/bilder2/logs/kernel_20130203.log and
http://evilazrael.net/bilder2/logs/kernel_20130203.log.gz

I think my action plan is:
- Get reliable controller ASAP
- Re-add the missing disk
- Upgrade to RAID 6
- Schedule regularly scrubbing

Thanks for all the help so far, i think i can see the light at the end
of the tunnel :)


Am 03.02.2013 02:22, schrieb Phil Turmel:
quoted
How do the serial numbers help?
It is vital to keep track of raid device number (logical position in the
array) versus drive serial numbers, as device names are not guaranteed
to be consistent between boots (and certainly not when mucking around
with cables and connectors).
I am aware of that problem then plugging drives around or adding new
ones during runtime.
When you are done with dd_rescue, make sure of the mapping again.
lsdrv[1] gives you both pieces of information in one utility, you might
find it easier than mapping by hand.
The owner's name sounds familar ;) Will send you a mail later.



Kind regards

Christoph Nelles


-- 
Christoph Nelles

E-Mail    : evilazrael@evilazrael.de
Jabber    : eazrael@evilazrael.net      ICQ       : 78819723

PGP-Key   : ID 0x424FB55B on subkeys.pgp.net
            or http://evilazrael.net/pgp.txt

Attachments

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help