Re: [PATCH 1/6] libata: Do not retry commands with valid autosense
From: Hannes Reinecke <hare@suse.de>
Date: 2015-08-03 16:47:46
Also in:
linux-scsi
On 08/03/2015 05:55 PM, Tejun Heo wrote:
Hello, James. On Mon, Aug 03, 2015 at 08:42:43AM -0700, James Bottomley wrote:quoted
I'd think it would be the same reason as all modern transports: it's faster and allows processing of sense data in-band. Under the old regime, the device is effectively frozen until you collect the data. Under autosense, the data is collected as part of the in-band command processing, so it doesn't stall the device. Modern drives (and protocols) are moving towards being somewhat more chatty with sense data. It doesn't just signal an error, mostly it's just reporting about drive characteristics or other advisory stuff. This means that if you handle it the old way, you'll get more drive stalls and a corresponding reduction in throughput.The problem is not the "auto" part but the "sense" part, I guess. ATA devices (the harddisks) never reported sense data and instead had a more rudimentary error bits and for newer devices NCQ log pages, so libata EH decodes those error information and takes appropriate actions for the indicated error condition. Hannes's patchset makes ATA devices mostly bypass libata EH when sense data is present. For, say, unrecoverable read errors, it'd be possible to make this scheme work (broken currently tho); however, libata and SCSI aren't that closely tied and there currently is no way for SCSI to tell libata that, e.g., link error was detected on the device side, so libata will fail to take link recovery actions on those cases. This *can* be made to work in a couple different ways but what's implemented now is pretty broken and making it work properly in any other way than integrating sense decoding into libata EH would require major restructuring of the whole thing which I'm not sure would be worthwhile at this point.
At the moment NCQ autosense is mostly used to provide the host with more details for a failed I/O. The typical case here is (no small surprise) ZAC disks, which use autosense to inform the host about a malformed I/O. It is _not_ being used as a replacement for existing error behaviour, (ie link errors are not being signalled with that; how could they if there is no link?); in fact, during testing I"ve seen both, autosense I/O failures and normal I/O failures for which autosense is not set, and the normal error handling kicks in. It's not that I've disable the original error handler completely, it's only bypassed for I/O failure where a sense code is provided. And the drive surely knows which error occurs, so we'd be daft not be using that. So I think disabling autosense completely is a bit extreme... Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage hare@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)