Thread (7 messages) 7 messages, 3 authors, 2014-02-25

Ideas/suggestions to avoid repeated locking and reducing too many lists with dmaengine?

From: Joel Fernandes <hidden>
Date: 2014-02-24 22:54:07
Also in: linux-omap, linux-rt-users, lkml

Correcting myself from an earlier post..

On 02/24/2014 04:38 PM, Joel Fernandes wrote:
quoted
quoted
 Also with respect to virt_dma (which is used by edma to manage all the
descriptors and lists) there are too many lists: submitted, issued,
completed etc and the descriptor moves from one to the other. I am
thinking if there is a way we can avoid using so many lists and just
have 2 lists and move the desc from one list to the other, That could
avoid using the intermediate list altogether and classify dma requests
as "done" or "not done".
The reason I created separate submitted and issued lists is that it's
much easier to manage than having everything on a single list.

We could deal with the submitted vs issued list, and that's to have the
channel store the cookie for the last issued descriptor - but I wonder
if it's worth the effort.

What I'd suggest is to try some profiling, and post some profiling
results which show where the problems are, rather than pointing at
bits of code you might not particularly like.
Actually I did do some tracing earlier before I posted this thread- and
notice there was excessive traces of locking/unlocking. It is very light
though as you pointed and lighter without debug options. The only other
notable difference is the fact that we are now going through the dmaengine
framework in the newer kernel vs the faster one.

One more thing in my trace is omap_dma_sync repeatedly call in memcpy_to_io
for every barrier call which is not necessary. I am working on a fix this.

On turning off DEBUG_KERNEL and running more tests, I do see some
improvements however the throughput reduction is still =~ 10%

With a modified openssl speed test app, I sent 16-byte sized block
repeatedly to the AES crypto hardware accelerator using EDMA:

On v3.13.5 kernel:
root at am335x-evm:~# openssl speed -evp aes-128-cbc -engine cryptodev
engine "cryptodev" set.
Doing aes-128-cbc for 3s on 16 size blocks: 79902 aes-128-cbc's

With v3.2 kernel,
Doing aes-128-cbc for 3s on 16 size blocks: 92314 aes-128-cbc's

So we're able to encrypt around 13k more ops, or around 4.5k ops/second
with 3.13.5
We're able to encrypt around 13k more ops, or around 4.5k ops/second
with the older 3.2 kernel that didn't use DMAEngine.

Regards,
-Joel
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help