Re: [PATCH net-next 5/5] netdev-genl: Support setting per-NAPI config values
From: Samiullah Khawaja <hidden>
Date: 2024-09-03 19:05:09
Also in:
lkml
This is great Joe. On Mon, Sep 2, 2024 at 6:02 PM Jakub Kicinski [off-list ref] wrote:
On Mon, 2 Sep 2024 18:56:07 +0200 Joe Damato wrote:quoted
quoted
How do you feel about making this configuration opt-in / require driver changes? What I'm thinking is that having the new "netif_napi_add()" variant (or perhaps extending netif_napi_set_irq()) to take an extra "index" parameter would make the whole thing much simpler.What about extending netif_queue_set_napi instead? That function takes a napi and a queue index.This may become problematic, since there may be more queues than NAPIs. Either with multiple combined queues mapped to a single NAPI, or having separate Rx/Tx NAPIs. No strong preference on which function we modify (netif_napi_add vs the IRQ setting helper) but I think we need to take the index as an explicit param, rather than try to guess it based on another ID. The queue ID will match 95% of the time, today. But Intel was interested in "remapping" queues. And we all think about a queue API... So using queue ID is going to cause problems down the road.
Do we need a queue to napi association to set/persist napi configurations? Can a new index param be added to the netif_napi_add and persist the configurations in napi_storage. I guess the problem would be the size of napi_storage. Also wondering if for some use case persistence would be problematic when the napis are recreated, since the new napi instances might not represent the same context? For example If I resize the dev from 16 rx/tx to 8 rx/tx queues and the napi index that was used by TX queue, now polls RX queue.
quoted
Locally I kinda of hacked up something simple that: - Allocates napi_storage in net_device in alloc_netdev_mqs - Modifies netif_queue_set_napi to: if (napi) napi->storage = dev->napi_storage[queue_index]; I think I'm still missing the bit about the max(rx_queues,tx_queues), though :(You mean whether it's enough to do napi_cnt = max(txqs, rxqs) ? Or some other question? AFAIU mlx5 can only have as many NAPIs as it has IRQs (256?). Look at ip -d link to see how many queues it has. We could do txqs + rxqs to be safe, if you prefer, but it will waste a bit of memory.quoted
quoted
Index would basically be an integer 0..n, where n is the number of IRQs configured for the driver. The index of a NAPI instance would likely match the queue ID of the queue the NAPI serves.Hmmm. I'm hesitant about the "number of IRQs" part. What if there are NAPIs for which no IRQ is allocated ~someday~ ?A device NAPI without an IRQ? Whoever does something of such silliness will have to deal with consequences. I think it's unlikely.quoted
It seems like (I could totally be wrong) that netif_queue_set_napi can be called and work and create the association even without an IRQ allocated. I guess the issue is mostly the queue index question above: combined rx/tx vs drivers having different numbers of rx and tx queues.Yes, and in the future the ability to allocate more queues than NAPIs. netif_queue_set_napi() was expected to cover such cases.quoted
quoted
We can then allocate an array of "napi_configs" in net_device - like we allocate queues, the array size would be max(num_rx_queue, num_tx_queues). We just need to store a couple of ints so it will be tiny compared to queue structs, anyway. The NAPI_SET netlink op can then work based on NAPI index rather than the ephemeral NAPI ID. It can apply the config to all live NAPI instances with that index (of which there really should only be one, unless driver is mid-reconfiguration somehow but even that won't cause issues, we can give multiple instances the same settings) and also store the user config in the array in net_device. When new NAPI instance is associate with a NAPI index it should get all the config associated with that index applied. Thoughts? Does that makes sense, and if so do you think it's an over-complication?I think what you are proposing seems fine; I'm just working out the implementation details and making sure I understand before sending another revision.If you get stuck feel free to send a link to a GH or post RFC. I'm adding extra asks so whether form of review helps.. :)