Thread (12 messages) 12 messages, 3 authors, 2026-02-04

Re: [PATCH net-next] net/mlx5e: Undo saving per-channel async ICOSQ

From: Tariq Toukan <hidden>
Date: 2026-02-04 06:20:16


On 03/02/2026 19:54, Alice Mikityanska wrote:
On Tue, Feb 3, 2026, at 16:58, Tariq Toukan wrote:
quoted
On 03/02/2026 11:20, Alice Mikityanska wrote:
quoted
On Mon, Feb 2, 2026, at 18:13, Daniel Borkmann wrote:
quoted
Hi Tariq,

On 2/1/26 1:50 PM, Tariq Toukan wrote:
quoted
On 26/01/2026 11:23, Daniel Borkmann wrote:
quoted
On 1/25/26 9:33 AM, Tariq Toukan wrote:
quoted
On 24/01/2026 0:39, Daniel Borkmann wrote:
quoted
This reverts the following commits:

     - ea945f4f3991 ("net/mlx5e: Move async ICOSQ lock into ICOSQ struct")
     - 56aca3e0f730 ("net/mlx5e: Use regular ICOSQ for triggering NAPI")
     - 1b080bd74840 ("net/mlx5e: Move async ICOSQ to dynamic allocation")
     - abed42f9cd80 ("net/mlx5e: Conditionally create async ICOSQ")

There are a couple of regressions on the xsk side I ran into:

Commit 56aca3e0f730 triggers an illegal synchronize_rcu() in an RCU read-
side critical section via mlx5e_xsk_wakeup() -> mlx5e_trigger_napi_icosq()
-> synchronize_net(). The stack holds RCU read-lock in xsk_poll().

Additionally, this also hits a NULL pointer dereference in mlx5e_xsk_wakeup():

     [  103.963735] BUG: kernel NULL pointer dereference, address: 0000000000000240
     [  103.963743] #PF: supervisor read access in kernel mode
     [  103.963746] #PF: error_code(0x0000) - not-present page
     [  103.963749] PGD 0 P4D 0
     [  103.963752] Oops: Oops: 0000 [#1] SMP
     [  103.963756] CPU: 0 UID: 0 PID: 2255 Comm: qemu-system-x86 Not tainted 6.19.0-rc5+ #229 PREEMPT(none)
     [  103.963761] Hardware name: [...]
     [  103.963765] RIP: 0010:mlx5e_xsk_wakeup+0x53/0x90 [mlx5_core]

What happens is that c->async_icosq is NULL when in mlx5e_xsk_wakeup()
and therefore access to c->async_icosq->state triggers it. (On the NIC
there is an XDP program installed by the control plane where traffic
gets redirected into an xsk map - there was no xsk pool set up yet.
At some later time a xsk pool is set up and the related xsk socket is
added to the xsk map of the XDP program.)
Thanks for your report.
quoted
Reverting the series fixes the problems again.
Revert is too aggressive here. A fix is preferable.
We're investigating the issue in order to fix it.
We'll update.
Ok, sounds good. Certainly the kTLS fixes seem independent, from the cause
of the issues I've hit it just seemed to me that they were quite fundamental
and that perhaps a different approach would be needed (or alternatively only
kTLS would need fixing, and the xsk optimization left as it was originally).
Anyway, I'll keep the revert locally for now, and happy to test patches.
Please check attached patch.
Hi Tariq,

I also took a glance at the patch: that seems to be correct in XSK parts, and mlx5e_trigger_napi_icosq now makes sense to me after your explanation.
Great.
quoted
Two small comments though:

1. I'd prefer DEBUG_NET_WARN_ON_ONCE instead of WARN_ON. At least XSK wakeup is a hot path, called frequently, so we don't want to flood dmesg with the same error, and it would be better to compile out the check in non-debug builds, hence DEBUG_NET_WARN_ON_ONCE.
I tend to totally drop this check.
quoted
2. I see you changed the condition of creating async_icosq to be created when any xdp_prog is attached, even when there are no XSK pools. Is there a case where async_icosq is useful with plain XDP without XSK? If not, I believe this condition should be XDP && XSK.
You're right.
As you know, the channels are not re-opened on XSK pool events.
They are re-opened on XDP program attach/detach.
Ah, that's right, I see. I didn't pay attention to where the check is
located. If you reorganize the code a little bit, it's possible to
implement the perfect check (XDP && XSK) || kTLS_RX, to avoid creating
the unneeded async_icosq with plain XDP.
Right, but I won't do it here as a fix.
(BTW, I see that ktls_rx_was_enabled is never reset to false: is it
because we can't reliably fence the moment when the driver stops using
it?)
Offload feature can be disabled but offloaded connections can stay alive 
until they're closed.
It is possible to track them and reset ktls_rx_was_enabled to false, but 
it's currently not implemented.
Anyway, it's up to you if you'd like to do it or not, but here is my
suggestion. We don't have to create async_icosq in mlx5e_open_queues.
Similar to how we separated mlx5e_open_xsk, we can separate
mlx5e_open_async_icosq, which is where we can check (XDP && XSK) ||
kTLS_RX.

Then mlx5e_open_channel will look like:

mlx5e_open_queues();
mlx5e_open_async_icosq(); // Checks the condition internally
if (xsk_pool)
         mlx5e_open_xsk();

mlx5e_xsk_enable_locked will also call mlx5e_open_async_icosq just
before mlx5e_open_xsk.

mlx5e_close_async_icosq will close it if it exists.

With this design, all cases are covered: every mlx5e_open_xsk is paired
with mlx5e_open_async_icosq; and kTLS RX enable flow reopens channels,
which also triggers mlx5e_open_async_icosq. The conditional is DRYed in
mlx5e_open_async_icosq itself, so no code repetition either.
Right, it would work, and save the async icosq creation for the case 
(XDP && !XSK).
I prefer to implement this change as a followup, and keep this fix simple.
quoted
Hence, as a simple and quick fix, we will check for XDP program and
create the async ICOSQ accordingly.
Currently we won't have create/destroy async ICOSQ when XSK pool is
activated/deactivated. I will mention this in the commit message.

This still prevents the async ICOSQ creation in default.
quoted
Thanks,
Alice
quoted
quoted
quoted
We were able to repro the issues internally and verify the fix.
We're finalizing it before submission.

I'd be glad if you can confirm it solves the issues for you.
That seems to work for me, yes. Feel free to add my Tested-by.

Thanks,
Daniel
  
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help