Thread (17 messages) 17 messages, 6 authors, 2007-01-31

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

From: Ric Wheeler <hidden>
Date: 2007-01-31 14:37:14
Also in: linux-scsi, lkml

Possibly related (same subject, not in this thread)


Jeff Garzik wrote:
Mark Lord wrote:
quoted
Eric D. Mudama wrote:
quoted
Actually, it's possibly worse, since each failure in libata will 
generate 3-4 retries.  With existing ATA error recovery in the 
drives, that's about 3 seconds per retry on average, or 12 seconds 
per failure.  Multiply that by the number of blocks past the error to 
complete the request..
It really beats the alternative of a forced reboot
due to, say, superblock I/O failing because it happened
to get merged with an unrelated I/O which then failed..
Etc..

FWIW -- speaking generally -- I think there are inevitable areas where 
libata error handling combined with SCSI error handling results in 
suboptimal error handling.

Just creating a list of "<this behavior> should be handled <this way>, 
but in reality is handled in <this silly way>" would be very helpful.
I agree - Tejun has done a great job at giving us a great base. Next step is to 
get clarity on what the types of errors are and how to differentiate between 
them (and maybe how that would change by class of device?).
Error handling is tough to get right, because the code is exercised so 
infrequently.  Tejun has actually done an above-average job here, by 
making device probe, hotplug and other "exceptions" go through the 
libata EH code, thereby exercising the EH code more than one might 
normally assume.

Some errors in libata probably should not be retried more than once, 
when we have a definitive diagnosis.  Suggestions for improvements are 
welcome.

    Jeff
One thing that we find really useful is to inject real errors into devices. Mark 
has some patches that let us inject media errors, we also bring back failed 
drives and run them through testing and occasionally get to use analyzers, etc 
to inject odd ball errors.

Hopefully, we will get some time to brainstorm about this at the workshop,

ric
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help