Re: All disk ar reported as spare disks
From: Rickard Svensson <hidden>
Date: 2020-02-02 16:30:22
Hi Phil & Wol and everyone else. I just wanted to say a big thank you, --assemble --force solved the problem and I got the raid running again :-D And now after a fsck I am copying all the data to my new raid1. And what I can see so far I don't seem to have lost anything :-) The new disks were purchased since before (WD Red NAS 10TB), but fortunately they have support for SCT "Error Recovery Control" , "Feature Control" , "Data Table". And "Recovery Control" is set to 70.70, just as mentioned on: https://raid.wiki.kernel.org/index.php/Timeout_Mismatch But I will still put that script into the startup of my new server. Once again, a big thanks for all the help! Best regards Rickard Den fre 31 jan. 2020 kl 14:57 skrev Phil Turmel [off-list ref]:
Hi Rickard, Good report. On 1/30/20 6:48 PM, Rickard Svensson wrote:quoted
Hello Excuse me for asking again. But this is a simpler(?) follow-up question to: https://marc.info/?t=157895855400002&r=1&w=2 In short summary. I had a raid 1 0, there were too many write errors on one disk (I call it DiskError1), which I did not notice, and then two days later the same problem on another disk (I call it DiskError2). I got good help here, and copy the disk portions of the 2 working disks as well as disk DiskError2 with ddrescue to new disks. Later I'll create a new raid 1, so I don't plan reuse the same raid 1 0 again. My questions: 1) I haven't copied the disk DiskError1, because it is older data, and it shouldn't be needed. Or is it better to add that one as well? 2) Everything looks pretty good :) But all disk ar reported as spare disks in /proc/mdstat A assume that is because "Events" count is not the same. It is same on the good disks(2864) but not DiskError2 (2719).No, the array isn't running, so /proc/mdstat isn't complete. Your three disks all have proper "Active device" roles per --examine.quoted
I have been looking how I can "force add" disk DiskError2, use "--force" or "--- zero-superblock"?Neither --add nor --zero-superblock is appropriate. They will break your otherwise very good condition.quoted
But would prefer to avoid making a mistake now, what has the greatest chance of being right :)First, ensure you do not have a timeout mismatch as evidenced in your original thread's smartctl output. The wiki has some advice. Hopefully your new drives are "NAS" rated and you need no special action. Then you should simply use --assemble --force with those three devices. That should get you running degraded. Then immediately backup the most valuable data in the array before doing anything else. Finally, --add a fourth device and let your raid rebuild its redundancy. When all is safe, consider converting to a more durable redundancy setup, like raid6, or raid10,near=3. Phil