Thread (20 messages) 20 messages, 6 authors, 2024-01-18

Re: [RFC PATCH net-next v5 2/2] net: add netmem to skb_frag_t

From: Mina Almasry <hidden>
Date: 2024-01-12 15:35:53
Also in: lkml

On Fri, Jan 12, 2024 at 3:51 AM Yunsheng Lin [off-list ref] wrote:
On 2024/1/12 8:34, Mina Almasry wrote:
quoted
On Thu, Jan 11, 2024 at 4:45 AM Yunsheng Lin [off-list ref] wrote:
quoted
On 2024/1/9 9:14, Mina Almasry wrote:

...
quoted
+             if (WARN_ON_ONCE(!skb_frag_page(&skb_shinfo(skb)->frags[0]))) {
I am really hate to bring it up again.
If you are not willing to introduce a new helper,
I'm actually more than happy to add a new helper like:

static inline netmem_ref skb_frag_netmem();

For future callers to obtain frag->netmem to use the netmem_ref directly.

What I'm hung up on is really your follow up request:

"Is it possible to introduce something like skb_frag_netmem() for
netmem? so that we can keep most existing users of skb_frag_page()
unchanged and avoid adding additional checking overhead for existing
users."

With this patchseries, skb_frag_t no longer has a page pointer inside
of it, it only has a netmem_ref. The netmem_ref is currently always a
page, but in the future may not be a page. Can you clarify how we keep
skb_frag_page() unchanged and without checks? What do you expect
skb_frag_page() and its callers to do? We can not assume netmem_ref is
always a struct page. I'm happy to implement a change but I need to
understand it a bit better.

You did not answer my question that I asked here, and ignoring this
question is preventing us from making any forward progress on this
discussion. What do you expect or want skb_frag_page() to do when
there is no page in the frag?
There are still many existing places still not expecting or handling
skb_frag_page() returning NULL, mostly those are in the drivers not
supporting devmem,
As of this series skb_frag_page() cannot return NULL.

In the devmem series, all core networking stack places where
skb_frag_page() may return NULL are audited.

skb_frag_page() returning NULL in a driver that doesn't support devmem
is not possible. The driver is the one that creates the devmem frags
in the first place. When the driver author adds devmem support, they
should also add support for skb_frag_page() returning NULL.
what's the point of adding the extral overhead for
those driver?
There is no overhead with static branches. The checks are no-op unless
the user enables devmem, at which point the devmem connections see no
overhead and non-devmem connections will see minimal overhead that I
suspect will not reproduce any perf issue. If the user is not fine
with that they can simply not enable devmem and continue to not
experience any overhead.
The networking stack should forbid skb with devmem frag being forwarded
to drivers not supporting devmem yet. I am sure if this is done properly
in your patchset yet? if not, I think you might to audit every existing
drivers handling skb_frag_page() returning NULL correctly.
There is no audit required. The devmem frags are generated by the
driver and forwarded to the TCP stack, not the other way around.
quoted
quoted
do you care to use some
existing API like skb_frag_address_safe()?
skb_frag_address_safe() checks that the page is mapped. In this case,
we are not checking if the frag page is mapped; we would like to make
sure that the skb_frag has a page inside of it in the first place.
Seems like a different check from skb_frag_address_safe().

In fact, skb_frag_address[_safe]() actually assume that the skb frag
is always a page currently, I think I need to squash this fix:
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index e59f76151628..bc8b107d0235 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -3533,7 +3533,9 @@ static inline void skb_frag_unref(struct sk_buff
*skb, int f)
  */
 static inline void *skb_frag_address(const skb_frag_t *frag)
 {
-       return page_address(skb_frag_page(frag)) + skb_frag_off(frag);
+       return skb_frag_page(frag) ?
+               page_address(skb_frag_page(frag)) + skb_frag_off(frag) :
+               NULL;
 }

 /**
@@ -3545,7 +3547,14 @@ static inline void *skb_frag_address(const
skb_frag_t *frag)
  */
 static inline void *skb_frag_address_safe(const skb_frag_t *frag)
 {
-       void *ptr = page_address(skb_frag_page(frag));
+       struct page *page;
+       void *ptr;
+
+       page = skb_frag_page(frag);
+       if (!page)
+               return NULL;
+
+       ptr = page_address(skb_frag_page(frag));
        if (unlikely(!ptr))
                return NULL;
quoted
quoted
+                     ret = -EINVAL;
+                     goto out;
+             }
+
              iov_iter_bvec(&msg.msg_iter, ITER_SOURCE,
-                           skb_shinfo(skb)->frags, skb_shinfo(skb)->nr_frags,
-                           msize);
+                           (const struct bio_vec *)skb_shinfo(skb)->frags,
+                           skb_shinfo(skb)->nr_frags, msize);
I think we need to use some built-time checking to ensure some consistency
between skb_frag_t and bio_vec.
I can add static_assert() that bio_vec->bv_len & bio_vec->bv_offset
are aligned with skb_frag_t->len & skb_frag_t->offset.

I can also maybe add a helper skb_frag_bvec() to do the cast instead
of doing it at the calling site. That may be a bit cleaner.
I think the more generic way to do is to add something like iov_iter_netmem()
instead of doing any cast, so that netmem can be decoupled from bvec completely.
quoted
quoted
quoted
              iov_iter_advance(&msg.msg_iter, txm->frag_offset);

              do {


--
Thanks,
Mina
.


-- 
Thanks,
Mina
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help