Thread (7 messages) 7 messages, 3 authors, 2015-05-13

Re: Possible RAID6 regression with ASYNC_TX_DMA enabled in 4.1

From: Shaohua Li <shli@kernel.org>
Date: 2015-05-12 20:55:15
Also in: lkml
Subsystem: software raid (multiple disks) support, the rest · Maintainers: Song Liu, Yu Kuai, Linus Torvalds

On Tue, May 12, 2015 at 02:55:46PM +0200, Maxime Ripard wrote:
Hi Shaohua,

On Sun, May 10, 2015 at 11:26:38PM -0700, Shaohua Li wrote:
quoted
On Thu, May 07, 2015 at 02:57:02PM +0200, Maxime Ripard wrote:
quoted
Hi,

I'm currently trying to add support for the PQ operations on the
marvell XOR engine, in dmaengine, obviously to be able to use async_tx
to offload these operations.

I'm testing these patches with a RAID6 array with 4 disks.

However, since the commit 59fc630b8b5f ("RAID5: batch adjacent full
stripe write", every write to that array fails with the following
stacktrace.

http://code.bulix.org/eh8iew-88342?raw

It seems to be generated by that warning here:

http://lxr.free-electrons.com/source/crypto/async_tx/async_tx.c#L173

And indeed, if we dump the status of depend_tx here, it's already been
acked.

That doesn't happen if ASYNC_TX_DMA is disabled, hence using the
software version of it, instead of relying on our XOR engine. It
doesn't happen on any commit prior to the one mentionned above, with
the exact same changes applied. These changes are meant to be
contributed, so I can definitely push them somewhere if needed.

I don't really know where to look for though, the change that is
causing this is probably the change in ops_run_reconstruct6, but I'm
not sure that this partial revert alone would work with regard to the
rest of the patch.
I don't have a machine with dmaengine, it's likely there is error in this side.
Could you please make stripe_can_batch() returns false always and check if the
error disappear? This should narrow down if it's related to batch issue.
The error indeed disappears if stripe_can_batch always returns false.
Does this fix it?

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 77dfd72..5e820fc 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -1825,7 +1825,7 @@ ops_run_reconstruct6(struct stripe_head *sh, struct raid5_percpu *percpu,
 	} else
 		init_async_submit(&submit, 0, tx, NULL, NULL,
 				  to_addr_conv(sh, percpu, j));
-	async_gen_syndrome(blocks, 0, count+2, STRIPE_SIZE,  &submit);
+	tx = async_gen_syndrome(blocks, 0, count+2, STRIPE_SIZE,  &submit);
 	if (!last_stripe) {
 		j++;
 		sh = list_first_entry(&sh->batch_list, struct stripe_head,
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help