Thread (99 messages) 99 messages, 10 authors, 2019-04-04

Re: [PATCH v3 2/5] ring: add a non-blocking implementation

From: Jerin Jacob Kollanukkaran <hidden>
Date: 2019-01-28 13:34:33

On Fri, 2019-01-25 at 17:21 +0000, Eads, Gage wrote:
quoted
-----Original Message-----
From: Ola Liljedahl [mailto:Ola.Liljedahl@arm.com]
Sent: Wednesday, January 23, 2019 4:16 AM
To: Eads, Gage <redacted>; dev@dpdk.org
Cc: olivier.matz@6wind.com; stephen@networkplumber.org; nd
[off-list ref]; Richardson, Bruce [off-list ref];
arybchenko@solarflare.com; Ananyev, Konstantin
[off-list ref]
Subject: Re: [dpdk-dev] [PATCH v3 2/5] ring: add a non-blocking
implementation

s.
quoted
quoted
You can tell this code was written when I thought x86-64 was the
only
viable target :). Yes, you are correct.

With regards to using __atomic intrinsics, I'm planning on taking
a
similar approach to the functions duplicated in
rte_ring_generic.h and
rte_ring_c11_mem.h: one version that uses rte_atomic functions
(and
thus stricter memory ordering) and one that uses __atomic
intrinsics
(and thus can benefit from more relaxed memory ordering).
What's the advantage of having two different implementations? What
is the
disadvantage?

The existing ring buffer code originally had only the "legacy"
implementation
which was kept when the __atomic implementation was added. The
reason
claimed was that some older compilers for x86 do not support GCC
__atomic
builtins. But I thought there was consensus that new functionality
could have
only __atomic implementations.
When CONFIG_RTE_RING_USE_C11_MEM_MODEL was introduced, it was left
disabled for thunderx[1] for performance reasons. Assuming that
hasn't changed, the advantage to having two versions is to best
support all of DPDK's platforms. The disadvantage is of course
duplicated code and the additional maintenance burden.

That said, if the thunderx maintainers are ok with it, I'm certainly 
The ring code was so fundamental building block for DPDK, there was 
difference in performance and there was already legacy code so
introducing C11_MEM_MODEL was justified IMO. 

For the nonblocking implementation, I am happy to test with
three ARM64 microarchitectures and share the result with C11_MEM_MODEL
vs non C11_MEM_MODLE performance. We may need to consider PPC also
here. So IMO, based on the overall performance result may be can decide
the new code direction.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help