AW: Possible RAID6 regression with ASYNC_TX_DMA enabled in 4.1
From: Markus Stockhausen <hidden>
Date: 2015-05-07 14:49:28
Also in:
lkml
Hi Maxime,
Von: linux-raid-owner@vger.kernel.org [linux-raid-owner@vger.kernel.org]" im Auftrag von "Maxime Ripard [maxime.ripard@free-electrons.com]
Gesendet: Donnerstag, 7. Mai 2015 14:57
An: Neil Brown; Shaohua Li
Cc: linux-raid@vger.kernel.org; linux-kernel@vger.kernel.org; Lior Amsalem; Thomas Petazzoni; Gregory Clement; Boris Brezillon
Betreff: Possible RAID6 regression with ASYNC_TX_DMA enabled in 4.1
Hi,
I'm currently trying to add support for the PQ operations on the
marvell XOR engine, in dmaengine, obviously to be able to use async_tx
to offload these operations.
I'm testing these patches with a RAID6 array with 4 disks.
However, since the commit 59fc630b8b5f ("RAID5: batch adjacent full
stripe write", every write to that array fails with the following
stacktrace.
http://code.bulix.org/eh8iew-88342?rawI don't know if it might be related. I added support for RAID6 Read-Modify-Write in software XOR with some patches. The following commit mangles some lines in async_pq.c: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/? id=584acdd49cd2472ca0f5a06adbe979db82d0b4af I introduced a new flag ASYNC_TX_PQ_XOR_DST that notifies the async layer that we want to do a XOR syndrome operation instead of a full calculation. This will enforce the software path because I guessed that hardware does not support that case. Without hardware to check I might have missed some checks in the async layer. In the upper layer ops_run_reconstruct6 will set the flag if we determined that rmw is faster than rcw. Can you check if rmw_level=0 fixes the issue. See: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/? id=d06f191f8ecaef4d524e765fdb455f96392fbd42
It seems to be generated by that warning here: http://lxr.free-electrons.com/source/crypto/async_tx/async_tx.c#L173 And indeed, if we dump the status of depend_tx here, it's already been acked. That doesn't happen if ASYNC_TX_DMA is disabled, hence using the software version of it, instead of relying on our XOR engine. It doesn't happen on any commit prior to the one mentionned above, with the exact same changes applied. These changes are meant to be contributed, so I can definitely push them somewhere if needed. I don't really know where to look for though, the change that is causing this is probably the change in ops_run_reconstruct6, but I'm not sure that this partial revert alone would work with regard to the rest of the patch. Maxime
Markus
Attachments
- InterScan_Disclaimer.txt [text/plain] 1694 bytes · preview