Thread (14 messages) 14 messages, 7 authors, 2020-09-02

Re: Feature request: Remove the badblocks list

From: Roy Sigurd Karlsbakk <hidden>
Date: 2020-09-02 14:51:53

I'm no MD expert, but I there are a couple of things to consider...

1) MD doesn't mark the sector as bad unless we try to write to it, AND
the drive replies to say it could not be written. So, in your case, the
drive is saying that it doesn't have any "spare" sectors left to
re-allocate, we are already passed that point.

2) When MD tries to read, it gets an error, so read from the other
mirror, or re-construct from parity/etc, and automatically attempt to
write to the sector, see point 1 above for the failure case.

So by the time MD gets a write error for a sector, the drive really is
bad, and MD can no longer ensure that *this* sector will be able to
properly store data again (whatever level of RAID we asked for, that
level can't be achieved with one drive faulty). So MD marks it bad, and
won't store any user data in that sector in future. As other drives are
replaced, we mark the corresponding sector on those drives as also bad,
so they also know that no user data should be stored there.

Eventually, we replace the faulty disk, and it would probably be safe to
store user data in the marked sector (assuming the new drive is not
faulty on the same sector, and all other member drives are not faulty on
the same sector).

So, to "fix" this, we just need a way to tell MD to try and write to all
member drives, on all faulty sectors, and if any drive returns fails to
write, then keep the sector as marked bad, if *ALL* drives succeed, then
remove from the bad blocks list on all members.

So why not add this feature to fix the problem, instead of throwing away
something that is potentially useful? Perhaps this could be done as part
of the "repair" mode, or done during a replace/add (when we reach the
"bad" sector, test the new drive, test all existing drives, and then
continue with the repair/add.

Would that solve the "bug"?
I'd better want md to stop fixing "somebody else's problem", that is, the disk, and rather just do its job. As for the case, I have tried to manually read those sectors named in the badblocks list and they all work. All of them. But then, there's no fixing, since they are proclaimed dead. So are their siblings' sectors with the same number, regardless of status.

If a drive has multiple issues with bad sector, kick it out. It doesn't have anything to do in the RAID anymore

Vennlig hilsen

roy
-- 
Roy Sigurd Karlsbakk
(+47) 98013356
http://blogg.karlsbakk.net/
GPG Public key: http://karlsbakk.net/roysigurdkarlsbakk.pubkey.txt
--
Hið góða skaltu í stein höggva, hið illa í snjó rita.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help