Re: Feature request: Remove the badblocks list
From: Adam Goryachev <hidden>
Date: 2020-09-02 15:09:49
On 3/9/20 00:50, Roy Sigurd Karlsbakk wrote:
quoted
I'm no MD expert, but I there are a couple of things to consider... 1) MD doesn't mark the sector as bad unless we try to write to it, AND the drive replies to say it could not be written. So, in your case, the drive is saying that it doesn't have any "spare" sectors left to re-allocate, we are already passed that point. 2) When MD tries to read, it gets an error, so read from the other mirror, or re-construct from parity/etc, and automatically attempt to write to the sector, see point 1 above for the failure case. So by the time MD gets a write error for a sector, the drive really is bad, and MD can no longer ensure that *this* sector will be able to properly store data again (whatever level of RAID we asked for, that level can't be achieved with one drive faulty). So MD marks it bad, and won't store any user data in that sector in future. As other drives are replaced, we mark the corresponding sector on those drives as also bad, so they also know that no user data should be stored there. Eventually, we replace the faulty disk, and it would probably be safe to store user data in the marked sector (assuming the new drive is not faulty on the same sector, and all other member drives are not faulty on the same sector). So, to "fix" this, we just need a way to tell MD to try and write to all member drives, on all faulty sectors, and if any drive returns fails to write, then keep the sector as marked bad, if *ALL* drives succeed, then remove from the bad blocks list on all members. So why not add this feature to fix the problem, instead of throwing away something that is potentially useful? Perhaps this could be done as part of the "repair" mode, or done during a replace/add (when we reach the "bad" sector, test the new drive, test all existing drives, and then continue with the repair/add. Would that solve the "bug"?I'd better want md to stop fixing "somebody else's problem", that is, the disk, and rather just do its job. As for the case, I have tried to manually read those sectors named in the badblocks list and they all work. All of them. But then, there's no fixing, since they are proclaimed dead. So are their siblings' sectors with the same number, regardless of status.
Just because you can read them, doesn't mean you can write them. Clearly, at some point in time, one of your drives failed. You now need to recover from that failed drive in the most sensible way.
If a drive has multiple issues with bad sector, kick it out. It doesn't have anything to do in the RAID anymore
And if a group of 100 sectors are bad on drive 1, and 100 different sectors on drive 2, you want to kick both drives out, and destroy all your data until you can create a new array and restore from backup? OR, just mark those parts of all disks faulty, and at some point in the future, you replace the disks, and then find a way to tell MD that the sectors are working now (and preferably, re-test them before marking them as OK)? BTW, I just found this: https://raid.wiki.kernel.org/index.php/The_Badblocks_controversy Which suggests that there is indeed a bug which should be hunted and fixed, and that actually the BBL isn't populated via failed writes, it is populated by failed reads while doing a replace/add, AND the failed read is from the source drive AND the parity/mirror drives. Either way, perhaps what is needed (if you are interested) is a repeatable test scenario causing the problem, which could then be used to identify and fix the bug. Regards, Adam