Re: [PATCH RFC net-next 2/4] net: page_pool: avoid calling no-op externals when possible
From: Alexander Lobakin <aleksander.lobakin@intel.com>
Date: 2023-06-30 12:30:46
Also in:
lkml
From: Alexander H Duyck <redacted> Date: Thu, 29 Jun 2023 09:45:26 -0700
On Thu, 2023-06-29 at 17:23 +0200, Alexander Lobakin wrote:quoted
Turned out page_pool_put{,_full}_page() can burn quite a bunch of cycles even when on DMA-coherent platforms (like x86) with no active IOMMU or swiotlb, just for the call ladder. Indeed, it's
[...]
quoted
@@ -341,6 +345,12 @@ static bool page_pool_dma_map(struct page_pool *pool, struct page *page) page_pool_set_dma_addr(page, dma); + if ((pool->p.flags & PP_FLAG_DMA_MAYBE_SYNC) && + dma_need_sync(pool->p.dev, dma)) { + pool->p.flags |= PP_FLAG_DMA_SYNC_DEV; + pool->p.flags &= ~PP_FLAG_DMA_MAYBE_SYNC; + } + if (pool->p.flags & PP_FLAG_DMA_SYNC_DEV) page_pool_dma_sync_for_device(pool, page, pool->p.max_len);I am pretty sure the logic is flawed here. The problem is dma_needs_sync depends on the DMA address being used. In the worst case scenario we could have a device that has something like a 32b DMA address space on a system with over 4GB of memory. In such a case the higher addresses would need to be synced because they will go off to a swiotlb bounce buffer while the lower addresses wouldn't. If you were to store a flag like this it would have to be generated per page.
I know when DMA might need syncing :D That's the point of this shortcut: if at least one page needs syncing, I disable it for the whole pool. It's a "better safe than sorry". Using a per-page flag involves more changes and might hurt some scenarios/setups. For example, non-coherent systems, where you always need to do syncs. The idea was to give some improvement when possible, otherwise just fallback to what we have today. Thanks, Olek