Thread (20 messages) 20 messages, 6 authors, 2024-01-18

Re: [RFC PATCH net-next v5 2/2] net: add netmem to skb_frag_t

From: Mina Almasry <hidden>
Date: 2024-01-18 13:57:07
Also in: lkml

On Wed, Jan 17, 2024 at 10:54 AM Willem de Bruijn
[off-list ref] wrote:
Mina Almasry wrote:
quoted
On Wed, Jan 17, 2024 at 10:00 AM Mina Almasry [off-list ref] wrote:
quoted
On Tue, Jan 16, 2024 at 4:16 AM Jason Gunthorpe [off-list ref] wrote:
quoted
On Tue, Jan 16, 2024 at 07:04:13PM +0800, Yunsheng Lin wrote:
quoted
On 2024/1/16 8:01, Jason Gunthorpe wrote:
quoted
On Mon, Jan 15, 2024 at 03:23:33PM -0800, Mina Almasry wrote:
quoted
quoted
quoted
You did not answer my question that I asked here, and ignoring this
question is preventing us from making any forward progress on this
discussion. What do you expect or want skb_frag_page() to do when
there is no page in the frag?
I would expect it to do nothing.
I don't understand. skb_frag_page() with an empty implementation just
results in a compiler error as the function needs to return a page
pointer. Do you actually expect skb_frag_page() to unconditionally
cast frag->netmem to a page pointer? That was explained as
unacceptable over and over again by Jason and Christian as it risks
casting devmem to page; completely unacceptable and will get nacked.
Do you have a suggestion of what skb_frag_page() should do that will
not get nacked by mm?
WARN_ON and return NULL seems reasonable?
That's more or less what I'm thinking.
quoted
quoted
While I am agreed that it may be a nightmare to debug the case of passing
a false page into the mm system, but I am not sure what's the point of
returning NULL to caller if the caller is not expecting or handling
the
You have to return something and NULL will largely reliably crash the
thread. The WARN_ON explains in detail why your thread just crashed.
Agreed.
quoted
quoted
NULL returning[for example, most of mm API called by the networking does not
seems to handling NULL as input page], isn't the NULL returning will make
the kernel panic anyway? Doesn't it make more sense to just add a BUG_ON()
depending on some configuration like CONFIG_DEBUG_NET or CONFIG_DEVMEM?
As returning NULL seems to be causing a confusion for the caller of
skb_frag_page() as whether to or how to handle the NULL returning case.
Possibly, though Linus doesn't like BUG_ON on principle..

I think the bigger challenge is convincing people that this devmem
stuff doesn't just open a bunch of holes in the kernel where userspace
can crash it.
It does not, and as of right now there are no pending concerns from
any netdev maintainers regarding mishandled devmem checks at least.
This is because the devmem series comes with a full audit of
skb_frag_page() callers [1] and all areas in the net stack attempting
to access the skb [2].

[1] https://patchwork.kernel.org/project/netdevbpf/patch/20231218024024.3516870-10-almasrymina@google.com/
[2] https://patchwork.kernel.org/project/netdevbpf/patch/20231218024024.3516870-11-almasrymina@google.com/
quoted
The fact you all are debating what to do with skb_frag_page() suggests
to me there isn't confidence...
The debate raging on is related to the performance of skb_frag_page(),
not correctness (and even then, I don't think it's related to
perf...). Yunsheng would like us to optimize skb_frag_page() using an
unconditional cast from netmem to page. This in Yunsheng's mind is a
performance optimization as we don't need to add an if statement
checking if the netmem is a page. I'm resistant to implement that
change so far because:

(a) unconditionally casting from netmem to page negates the compiler
type safety that you and Christian are laying out as a requirement for
the devmem stuff.
(b) With likely/unlikely or static branches the check to make sure
netmem is page is a no-op for existing use cases anyway, so AFAIU,
there is no perf gain from optimizing it out anyway.
Another thought, if anyone else is concerned about the devmem checks
performance,  potentially we could introduce CONFIG_NET_DEVMEM which
when disabled prevents the user from using devmem at all (disables the
netlink API).

When that is disabled, skb_frag_page(), netmem_to_page() & friends can
assume netmem is always a page and do a straight cast between netmem &
page. When it's enabled, it will check that netmem == page before
doing a cast, and return NULL if it is not a page.

I think this is technically viable and I think preserves the compiler
type safety requirements set by mm folks. From my POV though, bloating
the kernel with a a new CONFIG just to optimize out no-op checks seems
unnecessary, but if there is agreement that the checks are a concern,
adding CONFIG_NET_DEVMEM should address it while being acceptable to
mm maintainers.
I agree. A concern with CONFIGs is that what matters in practice is
which default the distros compile with. In this case, adding hurdles
to using the feature likely for no real reason.

Static branches are used throughout the kernel in performance
sensitive paths, exactly because they allow optional paths effectively
for free. I'm quite surprised that this issue is being raised so
strongly here, as they are hardly new or controversial.

But perhaps best is to show with data. Is there a representative page
pool benchmark that exercises the most sensitive case (XDP_DROP?) that
we can run with and without a static branch to demonstrate that any
diff is within the noise?
Yes, Jesper has a page_pool benchmark that he pointed me to in RFC v2:

https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/lib/bench_page_pool_simple.c

When running the results on RFCv2, the results were:

net-next @ b44693495af8
https://pastebin.com/raw/JuU7UQXe

+ devmem TCP changes:
https://pastebin.com/raw/mY1L6U4r

On RFC v2 the benchmark showed only a single instruction regression in
the page_pool fast path & the change deemed acceptable to Jesper from
a performance POV [1].

I have not run the benchmark continually on follow up iterations of
the RFC, but I think I'll start doing so in the future.

[1] https://lore.kernel.org/netdev/7aedc5d5-0daf-63be-21bc-3b724cc1cab9@redhat.com/ (local)
quoted
quoted
But none of this is related to correctness. Code calling
skb_frag_page() will fail or crash if it's not handled correctly
regardless of the implementation details of skb_frag_page(). In the
devmem series we add support to handle it correctly via [1] & [2].

--
Thanks,
Mina


--
Thanks,
Mina

-- 
Thanks,
Mina
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help