Re: end to end error recovery musings

(off-list ancestor, not in this archive)
Re: end to end error recovery musings · Neil Brown <hidden> · 2007-02-26
Re: end to end error recovery musings · Theodore Tso <tytso@mit.edu> · 2007-02-26
Re: end to end error recovery musings · Alan <hidden> · 2007-02-26
Re: end to end error recovery musings · Ric Wheeler <hidden> · 2007-02-26
Re: end to end error recovery musings · Alan <hidden> · 2007-02-26
Re: end to end error recovery musings · Ric Wheeler <hidden> · 2007-02-26
Re: end to end error recovery musings · James Bottomley <hidden> · 2007-02-26
Re: end to end error recovery musings · "H. Peter Anvin" <hpa@zytor.com> · 2007-02-26
Re: end to end error recovery musings · Jeff Garzik <hidden> · 2007-02-26
Re: end to end error recovery musings · Ric Wheeler <hidden> · 2007-02-26
Re: end to end error recovery musings · Alan <hidden> · 2007-02-27

From: Jeff Garzik <hidden>
Date: 2007-02-26 22:46:59
Also in: linux-fsdevel, linux-ide, linux-scsi

Theodore Tso wrote:

Can someone with knowledge of current disk drive behavior confirm that
for all drives that support bad block sparing, if an attempt to write
to a particular spot on disk results in an error due to bad media at
that spot, the disk drive will automatically rewrite the sector to a
sector in its spare pool, and automatically redirect that sector to
the new location.  I believe this should be always true, so presumably
with all modern disk drives a write error should mean something very
serious has happend.


This is what will /probably/ happen.  The drive should indeed find a 
spare sector and remap it, if the write attempt encounters a bad spot on 
the media.

However, with a large enough write, large enough bad-spot-on-media, and 
a firmware programmed to never take more than X seconds to complete 
their enterprise customers' I/O, it might just fail.


IMO, somewhere in the kernel, when we receive a read-op or write-op 
media error, we should immediately try to plaster that area with small 
writes.  Sure, if it's a read-op you lost data, but this method will 
maximize the chance that you can refresh/reuse the logical sectors in 
question.

	Jeff

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help