Re: [PATCHSET block#for-2.6.36-post] block: replace barrier with sequenced flush
From: Kiyoshi Ueda <hidden>
Date: 2010-08-24 10:24:41
Also in:
dm-devel, linux-fsdevel, linux-ide, linux-scsi, lkml
Hi Tejun, On 08/23/2010 11:17 PM +0900, Mike Snitzer wrote:
On Mon, Aug 23 2010 at 8:14am -0400, Tejun Heo [off-list ref] wrote:quoted
On 08/20/2010 10:26 AM, Kiyoshi Ueda wrote:quoted
By the way, if these patch-set with the change above are included, even one path failure for REQ_FLUSH on multipath configuration will be reported to upper layer as error, although it's retried using other paths currently. Then, if an upper layer won't take correct recovery action for the error, it would be seen as a regression for users. (e.g. Frequent EXT3-error resulting in read-only mount on multipath configuration.) Although I think the explicit error is fine rather than implicit data corruption, please check upper layers carefully so that users won't see such errors as much as possible.Argh... then it will have to discern why FLUSH failed. It can retry for transport errors but if it got aborted by the device it should report upwards.Yes, we discussed this issue of needing to train dm-multipath to know if there was a transport failure or not (at LSF). But I'm not sure when Hannes intends to repost his work in this area (updated to account for feedback from LSF).
Yes, checking whether it's a transport error in lower layer is the right solution. (Since I know it's not available yet, I just hoped if upper layers had some other options.) Anyway, only reporting errors for REQ_FLUSH to upper layer without such a solution would make dm-multipath almost unusable in real world, although it's better than implicit data loss.
quoted
Maybe just turn off barrier support in mpath for now?
If it's possible, it could be a workaround for a short term. But how can you do that? I think it's not enough to just drop REQ_FLUSH flag from q->flush_flags. Underlying devices of a mpath device may have write-back cache and it may be enabled. So if a mpath device doesn't set REQ_FLUSH flag in q->flush_flags, it becomes a device which has write-back cache but doesn't support flush. Then, upper layer can do nothing to ensure cache flush? Thanks, Kiyoshi Ueda