Thread (140 messages) 140 messages, 14 authors, 2022-10-16

Re: [PATCH 07/10] crypto: Use ARCH_DMA_MINALIGN instead of ARCH_KMALLOC_MINALIGN

From: Saravana Kannan <hidden>
Date: 2022-10-13 19:00:29
Also in: linux-mm, lkml

On Thu, Oct 13, 2022 at 9:57 AM Catalin Marinas [off-list ref] wrote:
On Wed, Oct 12, 2022 at 10:45:45AM -0700, Isaac Manjarres wrote:
quoted
On Fri, Sep 30, 2022 at 07:32:50PM +0100, Catalin Marinas wrote:
quoted
I started refreshing the series but I got stuck on having to do bouncing
for small buffers even if when they go through the iommu (and I don't
have the set up to test it yet).
For devices that go through the IOMMU, are you planning on adding
similar logic as you did in the direct-DMA path to bounce the buffer
prior to calling into whatever DMA ops are registered for the device?
Yes.
quoted
Also, there are devices with ARM64 CPUs that disable SWIOTLB usage because
none of the peripherals that they engage in DMA with need bounce buffering,
and also to reclaim the default 64 MB of memory that SWIOTLB uses. With
this approach, SWIOTLB usage will become mandatory if those devices need
to perform non-coherent DMA transactions that may not necessarily be DMA
aligned (e.g. small buffers), correct?
Correct. I've been thinking about this and a way around is to combine
the original series (dynamic kmalloc_minalign) with the new one so that
the arch code can lower the minimum alignment either to 8 if swiotlb is
available (usually in server space with more RAM) or the cache line size
if there is no bounce buffer.
quoted
If so, would there be concerns that the memory savings we get back from
reducing the memory footprint of kmalloc might be defeated by how much
memory is needed for bounce buffering?
It's not necessarily about the saved memory but also locality of the
small buffer allocations, less cache and TLB pressure.
Part of the pushback we get when we try to move some of the Android
ecosystem from 32-bit to 64-bit is the memory usage increase. So,
while the main goal might not be memory savings, it'll be good to keep
that in mind too. I'd definitely not want this patch series to make
things worse. Ideally, it'd make things better. 10MB is considered a
lot on some of the super low speced devices.
quoted
I understand that we can use the
"swiotlb=num_slabs" command line parameter to minimize the amount of
memory allocated for bounce buffering. If this is the only way to
minimize this impact, how much memory would you recommend to allocate
for bounce buffering on a system that will only use bounce buffers for
non-DMA-aligned buffers?
It's hard to tell, it would need to be guessed by trial and error on
specific hardware if you want to lower it. Another issue is that IIRC
the swiotlb is allocated in 2K slots, so you may need a lot more bounce
buffers than the actual memory allocated.

I wonder whether swiotlb is actually the best option for bouncing
unaligned buffers. We could use something like mempool_alloc() instead
if we stick to small buffers rather than any (even large) buffer that's
not aligned to a cache line. Or just go for kmem_cache_alloc() directly.
A downside is that we may need GFP_ATOMIC for such allocations, so
higher risk of failure.
Yeah, a temporary kmem_cache_alloc() to bounce buffers off of feels
like a better idea than swiotlb. Especially for small allocations (say
8 byte allocations) that might have gone into the kmem-cache-64 if we
hadn't dropped KMALLOC_MIN_ALIGN to 8.

-Saravana

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help