Thread (48 messages) 48 messages, 7 authors, 2024-09-09

Re: [PATCH net-next 5/5] netdev-genl: Support setting per-NAPI config values

From: Samiullah Khawaja <hidden>
Date: 2024-09-03 19:05:09
Also in: lkml

This is great Joe.

On Mon, Sep 2, 2024 at 6:02 PM Jakub Kicinski [off-list ref] wrote:
On Mon, 2 Sep 2024 18:56:07 +0200 Joe Damato wrote:
quoted
quoted
How do you feel about making this configuration opt-in / require driver
changes? What I'm thinking is that having the new "netif_napi_add()"
variant (or perhaps extending netif_napi_set_irq()) to take an extra
"index" parameter would make the whole thing much simpler.
What about extending netif_queue_set_napi instead? That function
takes a napi and a queue index.
This may become problematic, since there may be more queues than NAPIs.
Either with multiple combined queues mapped to a single NAPI, or having
separate Rx/Tx NAPIs.

No strong preference on which function we modify (netif_napi_add vs the
IRQ setting helper) but I think we need to take the index as an
explicit param, rather than try to guess it based on another ID.
The queue ID will match 95% of the time, today. But Intel was
interested in "remapping" queues. And we all think about a queue API...
So using queue ID is going to cause problems down the road.
Do we need a queue to napi association to set/persist napi
configurations? Can a new index param be added to the netif_napi_add
and persist the configurations in napi_storage. I guess the problem
would be the size of napi_storage.

Also wondering if for some use case persistence would be problematic
when the napis are recreated, since the new napi instances might not
represent the same context? For example If I resize the dev from 16
rx/tx to 8 rx/tx queues and the napi index that was used by TX queue,
now polls RX queue.
quoted
Locally I kinda of hacked up something simple that:
  - Allocates napi_storage in net_device in alloc_netdev_mqs
  - Modifies netif_queue_set_napi to:
     if (napi)
       napi->storage = dev->napi_storage[queue_index];

I think I'm still missing the bit about the
max(rx_queues,tx_queues), though :(
You mean whether it's enough to do

        napi_cnt = max(txqs, rxqs)

? Or some other question?

AFAIU mlx5 can only have as many NAPIs as it has IRQs (256?).
Look at ip -d link to see how many queues it has.
We could do txqs + rxqs to be safe, if you prefer, but it will
waste a bit of memory.
quoted
quoted
Index would basically be an integer 0..n, where n is the number of
IRQs configured for the driver. The index of a NAPI instance would
likely match the queue ID of the queue the NAPI serves.
Hmmm. I'm hesitant about the "number of IRQs" part. What if there
are NAPIs for which no IRQ is allocated ~someday~ ?
A device NAPI without an IRQ? Whoever does something of such silliness
will have to deal with consequences. I think it's unlikely.
quoted
It seems like (I could totally be wrong) that netif_queue_set_napi
can be called and work and create the association even without an
IRQ allocated.

I guess the issue is mostly the queue index question above: combined
rx/tx vs drivers having different numbers of rx and tx queues.
Yes, and in the future the ability to allocate more queues than NAPIs.
netif_queue_set_napi() was expected to cover such cases.
quoted
quoted
We can then allocate an array of "napi_configs" in net_device -
like we allocate queues, the array size would be max(num_rx_queue,
num_tx_queues). We just need to store a couple of ints so it will
be tiny compared to queue structs, anyway.

The NAPI_SET netlink op can then work based on NAPI index rather
than the ephemeral NAPI ID. It can apply the config to all live
NAPI instances with that index (of which there really should only
be one, unless driver is mid-reconfiguration somehow but even that
won't cause issues, we can give multiple instances the same settings)
and also store the user config in the array in net_device.

When new NAPI instance is associate with a NAPI index it should get
all the config associated with that index applied.

Thoughts? Does that makes sense, and if so do you think it's an
over-complication?
I think what you are proposing seems fine; I'm just working out the
implementation details and making sure I understand before sending
another revision.
If you get stuck feel free to send a link to a GH or post RFC.
I'm adding extra asks so whether form of review helps.. :)
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help