Re: [PATCH net-next v13 11/14] mm: page_frag: introduce prepare/probe/commit API
From: Yunsheng Lin <hidden>
Date: 2024-08-20 13:08:07
Also in:
linux-mm, lkml
On 2024/8/19 23:52, Alexander Duyck wrote:
quoted
Yes, the expectation is that somebody else didn't take an access to the page/data to send it off somewhere else between page_frag_alloc_va() and page_frag_alloc_abort(), did you see expectation was broken in that patch? If yes, we should fix that by using page_frag_free_va() related API instead of using page_frag_alloc_abort().The problem is when you expose it to XDP there are a number of different paths it can take. As such you shouldn't be expecting XDP to not do something like that. Basically you have to check the reference
Even if XDP operations like xdp_do_redirect() or tun_xdp_xmit() return failure, we still can not do that? It seems odd that happens. If not, can we use page_frag_alloc_abort() with fragsz being zero to avoid atomic operation?
count before you can rewind the page.quoted
quoted
quoted
quoted
quoted
+static struct page *__page_frag_cache_reload(struct page_frag_cache *nc, + gfp_t gfp_mask) { + struct page *page; + if (likely(nc->encoded_va)) { - if (__page_frag_cache_reuse(nc->encoded_va, nc->pagecnt_bias)) + page = __page_frag_cache_reuse(nc->encoded_va, nc->pagecnt_bias); + if (page) goto out; } - if (unlikely(!__page_frag_cache_refill(nc, gfp_mask))) - return false; + page = __page_frag_cache_refill(nc, gfp_mask); + if (unlikely(!page)) + return NULL; out: /* reset page count bias and remaining to start of new frag */ nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1; nc->remaining = page_frag_cache_page_size(nc->encoded_va); - return true; + return page; +} +None of the functions above need to be returning page.Are you still suggesting to always use virt_to_page() even when it is not really necessary? why not return the page here to avoid the virt_to_page()?Yes. The likelihood of you needing to pass this out as a page should be low as most cases will just be you using the virtual address anyway. You are essentially trading off branching for not having to use virt_to_page. It is unnecessary optimization.As my understanding, I am not trading off branching for not having to use virt_to_page, the branching is already needed no matter we utilize it to avoid calling virt_to_page() or not, please be more specific about which branching is traded off for not having to use virt_to_page() here.The virt_to_page overhead isn't that high. It would be better to just use a consistent path rather than try to optimize for an unlikely branch in your datapath.
I am not sure if I understand what do you mean by 'consistent path' here.
If I understand your comment correctly, the path is already not consistent
to avoid having to fetch size multiple times multiple ways as mentioned in
[1]. As below, doesn't it seems nature to avoid virt_to_page() calling while
avoiding page_frag_cache_page_size() calling, even if it is an unlikely case
as you mentioned:
struct page *page_frag_alloc_pg(struct page_frag_cache *nc,
unsigned int *offset, unsigned int fragsz,
gfp_t gfp)
{
unsigned int remaining = nc->remaining;
struct page *page;
VM_BUG_ON(!fragsz);
if (likely(remaining >= fragsz)) {
unsigned long encoded_va = nc->encoded_va;
*offset = page_frag_cache_page_size(encoded_va) -
remaining;
return virt_to_page((void *)encoded_va);
}
if (unlikely(fragsz > PAGE_SIZE))
return NULL;
page = __page_frag_cache_reload(nc, gfp);
if (unlikely(!page))
return NULL;
*offset = 0;
nc->remaining -= fragsz;
nc->pagecnt_bias--;
return page;
}
1. https://lore.kernel.org/all/CAKgT0UeQ9gwYo7qttak0UgXC9+kunO2gedm_yjtPiMk4VJp9yQ@mail.gmail.com/ (local)
quoted
quoted
quoted
quoted
quoted
+struct page *page_frag_alloc_pg(struct page_frag_cache *nc, + unsigned int *offset, unsigned int fragsz, + gfp_t gfp) +{ + unsigned int remaining = nc->remaining; + struct page *page; + + VM_BUG_ON(!fragsz); + if (likely(remaining >= fragsz)) { + unsigned long encoded_va = nc->encoded_va; + + *offset = page_frag_cache_page_size(encoded_va) - + remaining; + + return virt_to_page((void *)encoded_va); + } + + if (unlikely(fragsz > PAGE_SIZE)) + return NULL; + + page = __page_frag_cache_reload(nc, gfp); + if (unlikely(!page)) + return NULL; + + *offset = 0; + nc->remaining = remaining - fragsz; + nc->pagecnt_bias--; + + return page; } +EXPORT_SYMBOL(page_frag_alloc_pg);Again, this isn't returning a page. It is essentially returning a bio_vec without calling it as such. You might as well pass the bio_vec pointer as an argument and just have it populate it directly.I really don't think your bio_vec suggestion make much sense for now as the reason mentioned in below: "Through a quick look, there seems to be at least three structs which have similar values: struct bio_vec & struct skb_frag & struct page_frag. As your above agrument about using bio_vec, it seems it is ok to use any one of them as each one of them seems to have almost all the values we are using? Personally, my preference over them: 'struct page_frag' > 'struct skb_frag'quoted
'struct bio_vec', as the naming of 'struct page_frag' seems to best matchthe page_frag API, 'struct skb_frag' is the second preference because we mostly need to fill skb frag anyway, and 'struct bio_vec' is the last preference because it just happen to have almost all the values needed.That is why I said I would be okay with us passing page_frag in patch 12 after looking closer at the code. The fact is it should make the review of that patch set much easier if you essentially just pass the page_frag back out of the call. Then it could be used in exactly the same way it was before and should reduce the total number of lines of code that need to be changed.So the your suggestion changed to something like below? int page_frag_alloc_pfrag(struct page_frag_cache *nc, struct page_frag *pfrag); The API naming of 'page_frag_alloc_pfrag' seems a little odd to me, any better one in your mind?Well at this point we are populating/getting/pulling a page frag from the page frag cache. Maybe look for a word other than alloc such as populate. Essentially what you are doing is populating the pfrag from the frag cache, although I thought there was a size value you passed for that isn't there?
'struct page_frag' does have a size field, but I am not sure if I understand what do you mean by 'although I thought there was a size value you passed for that isn't there?‘ yet.