Re: All disk ar reported as spare disks

From: Rickard Svensson <hidden>
Date: 2020-02-02 16:30:22

Hi Phil & Wol  and everyone else.

I just wanted to say a big thank you, --assemble --force solved the
problem and I got the raid running again :-D

And now after a fsck I am copying all the data to my new raid1.
And what I can see so far I don't seem to have lost anything :-)

The new disks were purchased since before (WD Red NAS 10TB), but
fortunately they have support for SCT "Error Recovery Control" ,
"Feature Control" , "Data Table".
And "Recovery Control" is set to 70.70, just as mentioned on:
https://raid.wiki.kernel.org/index.php/Timeout_Mismatch

But I will still put that script into the startup of my new server.


Once again, a big thanks for all the help!

Best regards Rickard

Den fre 31 jan. 2020 kl 14:57 skrev Phil Turmel [off-list ref]:

Hi Rickard,

Good report.

On 1/30/20 6:48 PM, Rickard Svensson wrote:

quoted

Hello

Excuse me for asking again.

But this is a simpler(?) follow-up question to:
https://marc.info/?t=157895855400002&r=1&w=2

In short summary. I had a raid 1 0, there were too many write errors
on one disk (I call it DiskError1), which I did not notice, and then
two days later the same problem on another disk (I call it
DiskError2).

I got good help here, and copy the disk portions of the 2 working
disks as well as disk DiskError2 with ddrescue to new disks.
Later I'll create a new raid 1, so I don't plan reuse the same raid 1 0 again.


My questions:
1) I haven't copied the disk DiskError1, because it is older data, and
it shouldn't be needed.   Or is it better to add that one as well?

2) Everything looks pretty good :)
But all disk ar reported as spare disks in /proc/mdstat
A assume that is because "Events" count is not the same. It is same on
the good disks(2864) but not DiskError2 (2719).

No, the array isn't running, so /proc/mdstat isn't complete.  Your three
disks all have proper "Active device" roles per --examine.

quoted

I have been looking how I can "force add" disk DiskError2, use
"--force" or "--- zero-superblock"?

Neither --add nor --zero-superblock is appropriate.  They will break
your otherwise very good condition.

quoted

But would prefer to avoid making a mistake now,   what has the
greatest chance of being right :)

First, ensure you do not have a timeout mismatch as evidenced in your
original thread's smartctl output.  The wiki has some advice.  Hopefully
your new drives are "NAS" rated and you need no special action.

Then you should simply use --assemble --force with those three devices.

That should get you running degraded.  Then immediately backup the most
valuable data in the array before doing anything else.

Finally, --add a fourth device and let your raid rebuild its redundancy.

When all is safe, consider converting to a more durable redundancy
setup, like raid6, or raid10,near=3.

Phil

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help