Re: [PATCH net-next v5 3/5] page_pool: Allow drivers to hint on SKB recycling

From: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Date: 2021-05-14 07:36:32
Also in: bpf, linux-mm, linux-rdma, lkml

[...]

quoted

 	 * using a single memcpy() in __copy_skb_header()
 	 */

@@ -3088,7 +3095,13 @@ static inline void skb_frag_ref(struct sk_buff *skb, int f)
  */
 static inline void __skb_frag_unref(skb_frag_t *frag, bool recycle)

Does it make sure to define a new function like recyclable_skb_frag_unref()
instead of adding the recycle parameter? This way we may avoid checking
skb->pp_recycle for head data and every frag?

We'd still have to check when to run __skb_frag_unref or
recyclable_skb_frag_unref so I am not sure we can avoid that.
In any case I'll have a look

quoted

 {
-	put_page(skb_frag_page(frag));
+	struct page *page = skb_frag_page(frag);
+
+#ifdef CONFIG_PAGE_POOL
+	if (recycle && page_pool_return_skb_page(page_address(page)))
+		return;
+#endif
+	put_page(page);
 }
 
 /**

@@ -3100,7 +3113,7 @@ static inline void __skb_frag_unref(skb_frag_t *frag, bool recycle)
  */
 static inline void skb_frag_unref(struct sk_buff *skb, int f)
 {
-	__skb_frag_unref(&skb_shinfo(skb)->frags[f], false);
+	__skb_frag_unref(&skb_shinfo(skb)->frags[f], skb->pp_recycle);
 }
 
 /**

@@ -4699,5 +4712,14 @@ static inline u64 skb_get_kcov_handle(struct sk_buff *skb)
 #endif
 }
 
+#ifdef CONFIG_PAGE_POOL
+static inline void skb_mark_for_recycle(struct sk_buff *skb, struct page *page,
+					struct page_pool *pp)
+{
+	skb->pp_recycle = 1;
+	page_pool_store_mem_info(page, pp);
+}
+#endif
+
 #endif	/* __KERNEL__ */
 #endif	/* _LINUX_SKBUFF_H */

diff --git a/include/net/page_pool.h b/include/net/page_pool.h
index 24b3d42c62c0..ce75abeddb29 100644
--- a/include/net/page_pool.h
+++ b/include/net/page_pool.h

@@ -148,6 +148,8 @@ inline enum dma_data_direction page_pool_get_dma_dir(struct page_pool *pool)
 	return pool->p.dma_dir;
 }
 
+bool page_pool_return_skb_page(void *data);
+
 struct page_pool *page_pool_create(const struct page_pool_params *params);
 
 #ifdef CONFIG_PAGE_POOL

@@ -253,4 +255,11 @@ static inline void page_pool_ring_unlock(struct page_pool *pool)
 		spin_unlock_bh(&pool->ring.producer_lock);
 }
 
+/* Store mem_info on struct page and use it while recycling skb frags */
+static inline
+void page_pool_store_mem_info(struct page *page, struct page_pool *pp)
+{
+	page->pp = pp;
+}
+
 #endif /* _NET_PAGE_POOL_H */

diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 9de5d8c08c17..fa9f17db7c48 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c

@@ -626,3 +626,26 @@ void page_pool_update_nid(struct page_pool *pool, int new_nid)
 	}
 }
 EXPORT_SYMBOL(page_pool_update_nid);
+
+bool page_pool_return_skb_page(void *data)
+{
+	struct page_pool *pp;
+	struct page *page;
+
+	page = virt_to_head_page(data);
+	if (unlikely(page->pp_magic != PP_SIGNATURE))

we have checked the skb->pp_recycle before checking the page->pp_magic,
so the above seems like a likely() instead of unlikely()?

The check here is ! = PP_SIGNATURE. So since we already checked for
pp_recycle, it's unlikely the signature won't match.

quoted

+		return false;
+
+	pp = (struct page_pool *)page->pp;
+
+	/* Driver set this to memory recycling info. Reset it on recycle.
+	 * This will *not* work for NIC using a split-page memory model.
+	 * The page will be returned to the pool here regardless of the
+	 * 'flipped' fragment being in use or not.
+	 */
+	page->pp = NULL;

Why not only clear the page->pp when the page can not be recycled
by the page pool? so that we do not need to set and clear it every
time the page is recycled。

If the page cannot be recycled, page->pp will not probably be set to begin
with. Since we don't embed the feature in page_pool and we require the
driver to explicitly enable it, as part of the 'skb flow', I'd rather keep 
it as is.  When we set/clear the page->pp, the page is probably already in 
cache, so I doubt this will have any measurable impact.

quoted

+	page_pool_put_full_page(pp, virt_to_head_page(data), false);
+
 	C(end);

[...]

quoted

@@ -1725,6 +1734,7 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
 	skb->cloned   = 0;
 	skb->hdr_len  = 0;
 	skb->nohdr    = 0;
+	skb->pp_recycle = 0;

I am not sure why we clear the skb->pp_recycle here.
As my understanding, the pskb_expand_head() only allocate new head
data, the old frag page in skb_shinfo()->frags still could be from
page pool， right?

Ah correct! In that case we must not clear skb->pp_recycle.  The new head
will fail on the signature check and end up being freed, while the
remaining frags will be recycled. The *original* head will be
unmapped/recycled (based of the page refcnt)  on the pskb_expand_head()
itself.

quoted

 	atomic_set(&skb_shinfo(skb)->dataref, 1);
 
 	skb_metadata_clear(skb);

@@ -3495,7 +3505,7 @@ int skb_shift(struct sk_buff *tgt, struct sk_buff *skb, int shiftlen)
 		fragto = &skb_shinfo(tgt)->frags[merge];
 
 		skb_frag_size_add(fragto, skb_frag_size(fragfrom));
-		__skb_frag_unref(fragfrom, false);
+		__skb_frag_unref(fragfrom, skb->pp_recycle);
 	}
 
 	/* Reposition in the original skb */
@@ -5285,6 +5295,13 @@ bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from,
 	if (skb_cloned(to))
 		return false;
 
+	/* We can't coalesce skb that are allocated from slab and page_pool
+	 * The recycle mark is on the skb, so that might end up trying to
+	 * recycle slab allocated skb->head
+	 */
+	if (to->pp_recycle != from->pp_recycle)
+		return false;

Since we are also depending on page->pp_magic to decide whether to
recycle a page, we could just set the to->pp_recycle according to
from->pp_recycle and do the coalesce?

So I was think about this myself.  This check is a 'leftover' from my
initial version, were I only had the pp_recycle bit + struct page
meta-data (without the signature).  Since that version didn't have the
signature you could not coalesce 2 skb's coming from page_pool/slab. 
We could now do what you suggest, but honestly I can't think of many use
cases that this can happen to begin with.  I think I'd prefer leaving it as
is and adjusting the comment.  If we can somehow prove this happens
oftenly and has a performance impact, we can go ahead and remove it.

[...]

Thanks
/Ilias

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help