Ideas/suggestions to avoid repeated locking and reducing too many lists with dmaengine?
From: Joel Fernandes <hidden>
Date: 2014-02-24 22:54:07
Also in:
linux-omap, linux-rt-users, lkml
Correcting myself from an earlier post.. On 02/24/2014 04:38 PM, Joel Fernandes wrote:
quoted
quoted
Also with respect to virt_dma (which is used by edma to manage all the descriptors and lists) there are too many lists: submitted, issued, completed etc and the descriptor moves from one to the other. I am thinking if there is a way we can avoid using so many lists and just have 2 lists and move the desc from one list to the other, That could avoid using the intermediate list altogether and classify dma requests as "done" or "not done".The reason I created separate submitted and issued lists is that it's much easier to manage than having everything on a single list. We could deal with the submitted vs issued list, and that's to have the channel store the cookie for the last issued descriptor - but I wonder if it's worth the effort. What I'd suggest is to try some profiling, and post some profiling results which show where the problems are, rather than pointing at bits of code you might not particularly like.Actually I did do some tracing earlier before I posted this thread- and notice there was excessive traces of locking/unlocking. It is very light though as you pointed and lighter without debug options. The only other notable difference is the fact that we are now going through the dmaengine framework in the newer kernel vs the faster one. One more thing in my trace is omap_dma_sync repeatedly call in memcpy_to_io for every barrier call which is not necessary. I am working on a fix this. On turning off DEBUG_KERNEL and running more tests, I do see some improvements however the throughput reduction is still =~ 10% With a modified openssl speed test app, I sent 16-byte sized block repeatedly to the AES crypto hardware accelerator using EDMA: On v3.13.5 kernel: root at am335x-evm:~# openssl speed -evp aes-128-cbc -engine cryptodev engine "cryptodev" set. Doing aes-128-cbc for 3s on 16 size blocks: 79902 aes-128-cbc's With v3.2 kernel, Doing aes-128-cbc for 3s on 16 size blocks: 92314 aes-128-cbc's So we're able to encrypt around 13k more ops, or around 4.5k ops/second with 3.13.5
We're able to encrypt around 13k more ops, or around 4.5k ops/second with the older 3.2 kernel that didn't use DMAEngine. Regards, -Joel