Re: [RFC PATCH net-next v4 2/6] devlink: Extend devlink-rate api with queues and new parameters
From: Jiri Pirko <jiri@resnulli.us>
Date: 2022-09-29 07:13:11
Wed, Sep 28, 2022 at 01:47:03PM CEST, michal.wilczynski@intel.com wrote:
On 9/26/2022 1:51 PM, Jiri Pirko wrote:quoted
Thu, Sep 15, 2022 at 08:41:52PM CEST, michal.wilczynski@intel.com wrote:quoted
On 9/15/2022 5:31 PM, Edward Cree wrote:quoted
On 15/09/2022 14:42, Michal Wilczynski wrote:quoted
Currently devlink-rate only have two types of objects: nodes and leafs. There is a need to extend this interface to account for a third type of scheduling elements - queues. In our use case customer is sending different types of traffic on each queue, which requires an ability to assign rate parameters to individual queues.Is there a use-case for this queue scheduling in the absence of a netdevice? If not, then I don't see how this belongs in devlink; the configuration should instead be done in two parts: devlink-rate to schedule between different netdevices (e.g. VFs) and tc qdiscs (or some other netdev-level API) to schedule different queues within each single netdevice. Please explain why this existing separation does not support your use-case. Also I would like to see some documentation as part of this patch. It looks like there's no kernel document for devlink-rate unlike most other devlink objects; perhaps you could add one? -edHi, Previously we discussed adding queues to devlink-rate in this thread: https://lore.kernel.org/netdev/20220704114513.2958937-1-michal.wilczynski@intel.com/T/#u (local) In our use case we are trying to find a way to expose hardware Tx scheduler tree that is defined per port to user. Obviously if the tree is defined per physical port, all the scheduling nodes will reside on the same tree. Our customer is trying to send different types of traffic that require different QoS levels on the sameDo I understand that correctly, that you are assigning traffic to queues in VM, and you rate the queues on hypervisor? Is that the goal?Yes.
Why do you have this mismatch? If forces the hypervisor and VM admin to somehow sync upon the configuration. That does not sound correct to me.
quoted
quoted
VM, but on a different queues. This requires completely different rate setups for that queue - in the implementation that you're mentioning we wouldn't be able to arbitrarily reassign the queue to any node. Those queues would still need to share a single parent - their netdev. ThisSo that replies to Edward's note about having the queues maintained within the single netdev/vport, correct?Correct ;)
Okay. So you don't really need any kind of sharing devlink might be able to provide. From what you say and how I see this, it's clear. You should handle the per-queue shaping on the VM, on netdevice level, most probably by offloading some of the TC qdisc.
quoted
quoted
wouldn't allow us to fully take advantage of the HQoS and would introduce arbitrary limitations. Also I would think that since there is only one vendor implementing this particular devlink-rate API, there is some room for flexibility. Regarding the documentation, sure. I just wanted to get all the feedback from the mailing list and arrive at the final solution before writing the docs. BTW, I'm going to be out of office tomorrow, so will respond in this thread on Monday. BR, Michał