Re: Possible RAID6 regression with ASYNC_TX_DMA enabled in 4.1
From: Maxime Ripard <hidden>
Date: 2015-05-12 13:00:11
Also in:
lkml
Attachments
- signature.asc [application/pgp-signature] 819 bytes
From: Maxime Ripard <hidden>
Date: 2015-05-12 13:00:11
Also in:
lkml
Hi Shaohua, On Sun, May 10, 2015 at 11:26:38PM -0700, Shaohua Li wrote:
On Thu, May 07, 2015 at 02:57:02PM +0200, Maxime Ripard wrote:quoted
Hi, I'm currently trying to add support for the PQ operations on the marvell XOR engine, in dmaengine, obviously to be able to use async_tx to offload these operations. I'm testing these patches with a RAID6 array with 4 disks. However, since the commit 59fc630b8b5f ("RAID5: batch adjacent full stripe write", every write to that array fails with the following stacktrace. http://code.bulix.org/eh8iew-88342?raw It seems to be generated by that warning here: http://lxr.free-electrons.com/source/crypto/async_tx/async_tx.c#L173 And indeed, if we dump the status of depend_tx here, it's already been acked. That doesn't happen if ASYNC_TX_DMA is disabled, hence using the software version of it, instead of relying on our XOR engine. It doesn't happen on any commit prior to the one mentionned above, with the exact same changes applied. These changes are meant to be contributed, so I can definitely push them somewhere if needed. I don't really know where to look for though, the change that is causing this is probably the change in ops_run_reconstruct6, but I'm not sure that this partial revert alone would work with regard to the rest of the patch.I don't have a machine with dmaengine, it's likely there is error in this side. Could you please make stripe_can_batch() returns false always and check if the error disappear? This should narrow down if it's related to batch issue.
The error indeed disappears if stripe_can_batch always returns false. Maxime -- Maxime Ripard, Free Electrons Embedded Linux, Kernel and Android engineering http://free-electrons.com