Thread (32 messages) 32 messages, 4 authors, 2023-08-29

Re: [PATCH net-next v7 1/6] page_pool: frag API support for 32-bit arch with 64-bit DMA

From: Yunsheng Lin <hidden>
Date: 2023-08-18 09:00:14
Also in: lkml

On 2023/8/18 14:12, Ilias Apalodimas wrote:
On Fri, 18 Aug 2023 at 02:57, Jakub Kicinski [off-list ref] wrote:
quoted
On Thu, 17 Aug 2023 19:59:37 +0300 Ilias Apalodimas wrote:
quoted
quoted
Can we assume the DMA mapping of page pool is page aligned? We should
be, right?
Yes
quoted
That means we're storing 12 bits of 0 at the lower end.
So even with 32b of space we can easily store addresses for 32b+12b =>
16TB of memory. "Ought to be enough" to paraphrase Bill G, and the
problem is only in our heads?
Do you mean moving the pp_frag_count there?
Right, IIUC we don't have enough space to fit dma_addr_t and the
refcount, but if we store the dma addr on a shifted u32 instead
of using dma_addr_t explicitly - the refcount should fit?
struct page looks like this:

unsigned long dma_addr;
union {
      unsigned long dma_addr_upper;
      atomic_long_t pp_frag_count;
};

So, on 32bit platforms with 64bit dma we can't support a frag count at all.
We could either use the lower 12 bits (and have support for 4096 frags
'only') or do what you suggest.
Maybe we can rethink about defining a new memdesc for page used by
the network stack. As I took a glance at the Device Memory TCP v2
patchset from Mina, we may use the new memdesc to support more new
memory types, to decupple page_pool from the spcae limitation of
'stuct page', and to support frag page for 32bit platforms with
64bit dma too.
TBH I don't love any of these and since those platforms are rare (or
at least that's what I think), I prefer not supporting them at all.
quoted
quoted
I was questioning the need to have PP_FLAG_PAGE_SPLIT_IN_DRIVER
overall.  With Yunshengs patches such a platform would allocate a
page, so why should we prevent it from splitting it internally?
Splitting it is fine, the problem is that the refcount AKA
page->pp_frag_count** counts outstanding PP-aware references
and page->refcount counts PP-unaware references.

If we want to use page->refcount directly we'd need to unmap
the page whenever drivers calls page_pool_defrag_page().
But the driver may assume the page is still mapped afterwards.
What I am suggesting here is to not add the new
PP_FLAG_PAGE_SPLIT_IN_DRIVER flag.  If a driver wants to split pages
internally it should create a pool without
PP_FLAG_DMA_SYNC_DEV to begin with.  The only responsibility the
I am not sure why using PP_FLAG_DMA_SYNC_DEV is conflicted with page
split if the frag page is freed with dma_sync_size being -1 and the
pool->p.max_len is setup correctly.

I was thinking about defining page_pool_put_frag() which corresponds
to page_pool_dev_alloc_frag() and page_pool_free() which corresponds
to page_pool_dev_alloc(), so that driver author does not misuse the
dma_sync_size parameter for page_pool_put_page() related API.

And PP_FLAG_PAGE_SPLIT_IN_DRIVER is used to fail the page_pool creation
when driver is using page->pp_frag_count to split page itself in 32-bit
arch with 64-bit DMA.
driver would have is to elevate the page refcnt so page pool would not
try to free/recycle it.  Since it won't be able to allocate fragments
we don't have to worry about the rest.
quoted
We can change the API to make this behavior explicit. Although
IMHO that's putting the burden of rare platforms on non-rare
platforms which we should avoid.
Yep, agree here.
quoted
** I said it before and I will keep saying this until someone gets
   angry at me - I really think we should rename this field because
   the association with frags is a coincidence.
As my understanding, currently the naming is fine as it is because
page->pp_frag_count is used to indicate the number of frags a page
is split, there is only one user holding onto one frag for now.

But if there are more than one user holding onto one frag, then we
should probably rename it to something like page->pp_ref_count as
Jakub suggested before, for example below patch may enable more than
one user holding onto one frag.

https://patchwork.kernel.org/project/netdevbpf/patch/20230628121150.47778-1-liangchen.linux@gmail.com/
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help