Re: [PATCH v3 15/29] crypto/qat: use eal I/O device memory read/write API
From: Jerin Jacob <hidden>
Date: 2017-01-13 16:22:13
On Fri, Jan 13, 2017 at 03:50:59PM +0000, Ferruh Yigit wrote:
On 1/13/2017 2:57 PM, Jerin Jacob wrote:quoted
On Fri, Jan 13, 2017 at 11:32:29AM +0000, Ferruh Yigit wrote:quoted
On 1/13/2017 8:17 AM, Jerin Jacob wrote:quoted
On Thu, Jan 12, 2017 at 07:09:22PM +0000, Ferruh Yigit wrote:quoted
Hi Jerin, On 1/12/2017 9:17 AM, Jerin Jacob wrote: <...>quoted
+#include <rte_io.h> + /* CSR write macro */ -#define ADF_CSR_WR(csrAddr, csrOffset, val) \ - (void)((*((volatile uint32_t *)(((uint8_t *)csrAddr) + csrOffset)) \ - = (val))) +#define ADF_CSR_WR(csrAddr, csrOffset, val) \ + rte_write32(val, (((uint8_t *)csrAddr) + csrOffset))For IA, this update introduces an extra compiler barrier (rte_io_wmb()), which is indeed not a must, is this correct?AFAIK, Compiler barrier is required for IA. I am not an IA expert, if someone thinks it needs to changed then I can fix it in following commit in this patch series by making rte_io_wmb() and rte_io_rmb() as empty. Let me know. AFAIK, Linux kernel code has a barrier in readl/writel for IA. Typically we don't use any non relaxed versions in fast path.In fast typically all the drivers has explicit write barrier for doorbell write and followed by a relaxed version of write. IMO, In any event, it won't generate performance regression. [dpdk-master] $ git show 70c343bdc8c33a51a9db23cd58122bdfc120a58f commit 70c343bdc8c33a51a9db23cd58122bdfc120a58f Author: Jerin Jacob [off-list ref] Date: Mon Dec 5 06:36:49 2016 +0530 eal/x86: define I/O device memory barriers for IA The patch does not provide any functional change for IA. I/O barriers are mapped to existing smp barriers. CC: Bruce Richardson [off-list ref] CC: Konstantin Ananyev [off-list ref] Signed-off-by: Jerin Jacob [off-list ref]diff --git a/lib/librte_eal/common/include/arch/x86/rte_atomic.hb/lib/librte_eal/common/include/arch/x86/rte_atomic.h index 00b1cdf..4eac666 100644--- a/lib/librte_eal/common/include/arch/x86/rte_atomic.h +++ b/lib/librte_eal/common/include/arch/x86/rte_atomic.h@@ -61,6 +61,12 @@ extern "C" { #define rte_smp_rmb() rte_compiler_barrier() +#define rte_io_mb() rte_mb() + +#define rte_io_wmb() rte_compiler_barrier() + +#define rte_io_rmb() rte_compiler_barrier() + /*------------------------- 16 bit atomic operations * -------------------------*/ #ifndef RTE_FORCE_INTRINSICSquoted
If so, does it make sense to override these functions for x86, and make rte_writeX = rte_writeX_relaxed rte_readX = rte_readX_relaxedquoted
/* CSR read macro */ -#define ADF_CSR_RD(csrAddr, csrOffset) \ - (*((volatile uint32_t *)(((uint8_t *)csrAddr) + csrOffset))) +#define ADF_CSR_RD(csrAddr, csrOffset) \ + rte_read32((((uint8_t *)csrAddr) + csrOffset))This patchset both introduces new rte_readX/rte_writeX functions, also applies them into drivers. While applying them, it changes the behavior. Like above code was doing a read, but after update it does read and read_memory_barrier. What do you think this patchset updates usage in a manner that keeps behavior exact same. Like using rte_read32_relaxed for this case. And doing architecture related updates in a different patchset?Need to use rte_read32 at this commit otherwise it will break for ARM. That's was all point for this patchset.Why it breaks the ARM, is it because rte_*mb() updated for ARM in this patchset (patch 7/29) ?Yes.quoted
I believe it is good to make these modifications in two phase:It is in two phases only. First introduced the API with implementation and enabled in each driver. Why did you think other-way around it is better?For two things: 1- If something goes wrong, find the source of problem easier.
How? Are you suggesting like this below, 0) Introduce rte_io_?mb() 1) Introduce readxx_relaxed and writexx_relaxed 2) Change all the drivers with readxx_relaxed and writexx_relaxed(15 change sets) to keep the same behavior 3) Introduce readxx and writexx 4) revert step 2 changes and make driver based on readxx and writexx(again 15 change sets) Instead of(existing one) 0) Introduce rte_io_?mb() 1) Introduce readxx_relaxed and writexx_relaxed 2) Introduce readxx and writexx 3) Change all the drivers with readxx_[relaxed] and writexx_[relaxed] Proposed scheme makes driver authors to review two check-ins. And git bisect fail on fourth case of instead of thrird case with existing one. Not sure what we soloving here? Thomas, Any thoughts?
2- Make architectural changes obvious, right now it is a little hard to see, and this again for item 1. But I also would like to hear more comments before you change/try anything.quoted
I can rework and test if there is any value addition. If you concerned about git bisect ability then I don't think we are loosing that in this model. Thoughts?quoted
- First replace old usage with rte_readX/rte_writeX while keeping exact same behavior - Second, do architecture specific changes. Both in eal and drivers level if required. Thanks, ferruhquoted
For performance regression, we can always verify by taking delta between this changeset and the previous changeset. If you think, I need to make rte_io_wmb()/rte_io_rmb() as empty for IA then I could do that as well.quoted
This both makes easy to see architecture specific updates, and makes easy to trace any possible performance issues by this patchset.quoted
#define ADF_BANK_INT_SRC_SEL_MASK_0 0x4444444CUL #define ADF_BANK_INT_SRC_SEL_MASK_X 0x44444444UL