Re: end to end error recovery musings
From: Theodore Tso <tytso@mit.edu>
Date: 2007-02-24 02:32:29
Also in:
linux-fsdevel, linux-raid, linux-scsi
From: Theodore Tso <tytso@mit.edu>
Date: 2007-02-24 02:32:29
Also in:
linux-fsdevel, linux-raid, linux-scsi
On Fri, Feb 23, 2007 at 05:37:23PM -0700, Andreas Dilger wrote:
quoted
Probably the only sane thing to do is to remember the bad sectors and avoid attempting reading them; that would mean marking "automatic" versus "explicitly requested" requests to determine whether or not to filter them against a list of discovered bad blocks.And clearing this list when the sector is overwritten, as it will almost certainly be relocated at the disk level. For that matter, a huge win would be to have the MD RAID layer rewrite only the bad sector (in hopes of the disk relocating it) instead of failing the whiole disk. Otherwise, a few read errors on different disks in a RAID set can take the whole system offline. Apologies if this is already done in recent kernels...
And having a way of making this list available to both the filesystem and to a userspace utility, so they can more easily deal with doing a forced rewrite of the bad sector, after determining which file is involved and perhaps doing something intelligent (up to and including automatically requesting a backup system to fetch a backup version of the file, and if it can be determined that the file shouldn't have been changed since the last backup, automatically fixing up the corrupted data block :-). - Ted