Thread (222 messages) 222 messages, 11 authors, 2017-01-19

Re: [PATCH v3 15/29] crypto/qat: use eal I/O device memory read/write API

From: Jerin Jacob <hidden>
Date: 2017-01-13 16:22:13

On Fri, Jan 13, 2017 at 03:50:59PM +0000, Ferruh Yigit wrote:
On 1/13/2017 2:57 PM, Jerin Jacob wrote:
quoted
On Fri, Jan 13, 2017 at 11:32:29AM +0000, Ferruh Yigit wrote:
quoted
On 1/13/2017 8:17 AM, Jerin Jacob wrote:
quoted
On Thu, Jan 12, 2017 at 07:09:22PM +0000, Ferruh Yigit wrote:
quoted
Hi Jerin,

On 1/12/2017 9:17 AM, Jerin Jacob wrote:
<...>
quoted
+#include <rte_io.h>
+
 /* CSR write macro */
-#define ADF_CSR_WR(csrAddr, csrOffset, val) \
-	(void)((*((volatile uint32_t *)(((uint8_t *)csrAddr) + csrOffset)) \
-			= (val)))
+#define ADF_CSR_WR(csrAddr, csrOffset, val)		\
+	rte_write32(val, (((uint8_t *)csrAddr) + csrOffset))
For IA, this update introduces an extra compiler barrier (rte_io_wmb()),
which is indeed not a must, is this correct?
AFAIK, Compiler barrier is required for IA. I am not an IA expert, if
someone thinks it needs to changed then I can fix it in following commit
in this patch series by making rte_io_wmb() and rte_io_rmb() as empty.

Let me know.

AFAIK, Linux kernel code has a barrier in readl/writel for IA.

Typically we don't use any non relaxed versions in fast path.In fast
typically all the drivers has explicit write barrier for doorbell write
and followed by a relaxed version of write. IMO, In any event, it won't
generate performance regression.

[dpdk-master] $ git show
70c343bdc8c33a51a9db23cd58122bdfc120a58f
commit 70c343bdc8c33a51a9db23cd58122bdfc120a58f
Author: Jerin Jacob [off-list ref]
Date:   Mon Dec 5 06:36:49 2016 +0530

    eal/x86: define I/O device memory barriers for IA

    The patch does not provide any functional change for IA.
    I/O barriers are mapped to existing smp barriers.

    CC: Bruce Richardson [off-list ref]
    CC: Konstantin Ananyev [off-list ref]
    Signed-off-by: Jerin Jacob [off-list ref]
diff --git a/lib/librte_eal/common/include/arch/x86/rte_atomic.h
b/lib/librte_eal/common/include/arch/x86/rte_atomic.h
index 00b1cdf..4eac666 100644
--- a/lib/librte_eal/common/include/arch/x86/rte_atomic.h
+++ b/lib/librte_eal/common/include/arch/x86/rte_atomic.h
@@ -61,6 +61,12 @@ extern "C" {
 
 #define rte_smp_rmb() rte_compiler_barrier()
 
+#define rte_io_mb() rte_mb()
+
+#define rte_io_wmb() rte_compiler_barrier()
+
+#define rte_io_rmb() rte_compiler_barrier()
+
 /*------------------------- 16 bit atomic operations
 * -------------------------*/
 
 #ifndef RTE_FORCE_INTRINSICS
quoted
If so, does it make sense to override these functions for x86, and make
rte_writeX = rte_writeX_relaxed
rte_readX = rte_readX_relaxed
quoted
 
 /* CSR read macro */
-#define ADF_CSR_RD(csrAddr, csrOffset) \
-	(*((volatile uint32_t *)(((uint8_t *)csrAddr) + csrOffset)))
+#define ADF_CSR_RD(csrAddr, csrOffset)			\
+	rte_read32((((uint8_t *)csrAddr) + csrOffset))
This patchset both introduces new rte_readX/rte_writeX functions, also
applies them into drivers.

While applying them, it changes the behavior.
Like above code was doing a read, but after update it does read and
read_memory_barrier.

What do you think this patchset updates usage in a manner that keeps
behavior exact same. Like using rte_read32_relaxed for this case.
And doing architecture related updates in a different patchset?
Need to use rte_read32 at this commit otherwise it will break for ARM.
That's was all point for this patchset.
Why it breaks the ARM, is it because rte_*mb() updated for ARM in this
patchset (patch 7/29) ?
Yes.

quoted
I believe it is good to make these modifications in two phase:
It is in two phases only. First introduced the API with implementation and
enabled in each driver. Why did you think other-way around it is better?
For two things:
1- If something goes wrong, find the source of problem easier.
How?

Are you suggesting like this below,
0) Introduce rte_io_?mb()
1) Introduce readxx_relaxed and writexx_relaxed
2) Change all the drivers with readxx_relaxed and writexx_relaxed(15
change sets) to keep the same behavior
3) Introduce readxx and writexx
4) revert step 2 changes and make driver based on readxx and
writexx(again 15 change sets)


Instead of(existing one)
0) Introduce rte_io_?mb()
1) Introduce readxx_relaxed and writexx_relaxed
2) Introduce readxx and writexx
3) Change all the drivers with readxx_[relaxed] and writexx_[relaxed]

Proposed scheme makes driver authors to review two check-ins.
And git bisect fail on fourth case of instead of thrird case with
existing one.

Not sure what we soloving here?

Thomas,
Any thoughts?

2- Make architectural changes obvious, right now it is a little hard to
see, and this again for item 1.

But I also would like to hear more comments before you change/try anything.
quoted
I can rework and test if there is any value addition. If you concerned
about git bisect ability then I don't think we are loosing that in this
model.

Thoughts?
quoted
- First replace old usage with rte_readX/rte_writeX while keeping exact
same behavior

- Second, do architecture specific changes. Both in eal and drivers
level if required.

Thanks,
ferruh
quoted
For performance regression, we can always verify by taking delta
between this changeset and the previous changeset. If you think, I need
to make rte_io_wmb()/rte_io_rmb() as empty for IA then I could do that
as well.

quoted
This both makes easy to see architecture specific updates, and makes
easy to trace any possible performance issues by this patchset.
quoted
 
 #define ADF_BANK_INT_SRC_SEL_MASK_0 0x4444444CUL
 #define ADF_BANK_INT_SRC_SEL_MASK_X 0x44444444UL
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help