Re: RAID-10 keeps aborting

From: Joe Lawrence <hidden>
Date: 2013-06-04 15:39:07

Possibly related (same subject, not in this thread)

2013-06-13 · Re: RAID-10 keeps aborting · "Martin K. Petersen" <martin.petersen@oracle.com>
2013-06-13 · Re: RAID-10 keeps aborting · NeilBrown <hidden>
2013-06-13 · Re: RAID-10 keeps aborting · "H. Peter Anvin" <hpa@zytor.com>
2013-06-13 · Re: RAID-10 keeps aborting · NeilBrown <hidden>
2013-06-13 · Re: RAID-10 keeps aborting · NeilBrown <hidden>

On Mon, 3 Jun 2013, H. Peter Anvin wrote:

On 06/03/2013 11:35 AM, Martin K. Petersen wrote:

quoted

quoted
quoted
quoted
quoted
"hpa" == H Peter Anvin [off-list ref] writes:

hpa> OK, so the device here says don't do this again, but fails the
hpa> request anyway expecting the block device to pick up the slack.

Yes, the block layer function will resort to writing out zeroes directly
in this case.

MD should not consider a rejected WRITE SAME a failure.

We should probably add Joe Lawrence to this thread.

Joe: basically it seems that the error behavior of md (at least raid10,
but probably raid1 as well) on WRITE SAME is wrong, and it causes the
RAID to abort.

Martin is probably the expert here (I had extended his initial WRITE SAME 
support in MD raid0 to raid1 and raid10), but I can try failing a WS cmd 
using our San Blaze emulator to see the fall out. 

Just curious, what type drives were in your RAID and what does
/sys/class/scsi_disk/*/max_write_same_blocks report?  If you have a spare 
drive to test, maybe you could try a quick sg_write_same command to see 
how the drive reacts?

-- Joe

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help