Re: [RFC PATCH net-next v5 2/2] net: add netmem to skb_frag_t
From: Mina Almasry <hidden>
Date: 2024-01-17 18:01:11
Also in:
lkml
On Tue, Jan 16, 2024 at 4:16 AM Jason Gunthorpe [off-list ref] wrote:
On Tue, Jan 16, 2024 at 07:04:13PM +0800, Yunsheng Lin wrote:quoted
On 2024/1/16 8:01, Jason Gunthorpe wrote:quoted
On Mon, Jan 15, 2024 at 03:23:33PM -0800, Mina Almasry wrote:quoted
quoted
quoted
You did not answer my question that I asked here, and ignoring this question is preventing us from making any forward progress on this discussion. What do you expect or want skb_frag_page() to do when there is no page in the frag?I would expect it to do nothing.I don't understand. skb_frag_page() with an empty implementation just results in a compiler error as the function needs to return a page pointer. Do you actually expect skb_frag_page() to unconditionally cast frag->netmem to a page pointer? That was explained as unacceptable over and over again by Jason and Christian as it risks casting devmem to page; completely unacceptable and will get nacked. Do you have a suggestion of what skb_frag_page() should do that will not get nacked by mm?WARN_ON and return NULL seems reasonable?
That's more or less what I'm thinking.
quoted
While I am agreed that it may be a nightmare to debug the case of passing a false page into the mm system, but I am not sure what's the point of returning NULL to caller if the caller is not expecting or handling theYou have to return something and NULL will largely reliably crash the thread. The WARN_ON explains in detail why your thread just crashed.
Agreed.
quoted
NULL returning[for example, most of mm API called by the networking does not seems to handling NULL as input page], isn't the NULL returning will make the kernel panic anyway? Doesn't it make more sense to just add a BUG_ON() depending on some configuration like CONFIG_DEBUG_NET or CONFIG_DEVMEM? As returning NULL seems to be causing a confusion for the caller of skb_frag_page() as whether to or how to handle the NULL returning case.Possibly, though Linus doesn't like BUG_ON on principle.. I think the bigger challenge is convincing people that this devmem stuff doesn't just open a bunch of holes in the kernel where userspace can crash it.
It does not, and as of right now there are no pending concerns from any netdev maintainers regarding mishandled devmem checks at least. This is because the devmem series comes with a full audit of skb_frag_page() callers [1] and all areas in the net stack attempting to access the skb [2]. [1] https://patchwork.kernel.org/project/netdevbpf/patch/20231218024024.3516870-10-almasrymina@google.com/ [2] https://patchwork.kernel.org/project/netdevbpf/patch/20231218024024.3516870-11-almasrymina@google.com/
The fact you all are debating what to do with skb_frag_page() suggests to me there isn't confidence...
The debate raging on is related to the performance of skb_frag_page(), not correctness (and even then, I don't think it's related to perf...). Yunsheng would like us to optimize skb_frag_page() using an unconditional cast from netmem to page. This in Yunsheng's mind is a performance optimization as we don't need to add an if statement checking if the netmem is a page. I'm resistant to implement that change so far because: (a) unconditionally casting from netmem to page negates the compiler type safety that you and Christian are laying out as a requirement for the devmem stuff. (b) With likely/unlikely or static branches the check to make sure netmem is page is a no-op for existing use cases anyway, so AFAIU, there is no perf gain from optimizing it out anyway. But none of this is related to correctness. Code calling skb_frag_page() will fail or crash if it's not handled correctly regardless of the implementation details of skb_frag_page(). In the devmem series we add support to handle it correctly via [1] & [2]. -- Thanks, Mina