Re: [RFC PATCH net-next v4 2/6] devlink: Extend devlink-rate api with queues and new parameters
From: Jiri Pirko <jiri@resnulli.us>
Date: 2022-09-29 07:08:30
Wed, Sep 28, 2022 at 01:53:24PM CEST, michal.wilczynski@intel.com wrote:
On 9/26/2022 1:58 PM, Jiri Pirko wrote:quoted
Tue, Sep 20, 2022 at 01:09:04PM CEST, ecree.xilinx@gmail.com wrote:quoted
On 19/09/2022 14:12, Wilczynski, Michal wrote:quoted
Maybe a switchdev case would be a good parallel here. When you enable switchdev, you get port representors on the host for each VF that is already attached to the VM. Something that gives the host power to configure netdev that it doesn't 'own'. So it seems to me like giving user more power to configure things from the hostWell, not really. It gives the user on hypervisor possibility to configure the eswitch vport side. The other side of the wire, which is in VM, is autonomous.Frankly speaking the VM is still free to assign traffic to queues as before, I guess the networking card scheduling algorithm will just drain those queues at different pace.
That was not my point, my point is, that with per-queue shaping, you are basically configuring the other side of the wire (VF), when this config is out of the domain of hypervisor.
quoted
quoted
quoted
is acceptable.Right that's the thing though: I instinctively Want this to be done through representors somehow, because it _looks_ like it ought to be scoped to a single netdev; but that forces the hierarchy to respect netdev boundaries which as we've discussed is an unwelcome limitation.Why exacly? Do you want to share a single queue between multiple vport? Or what exactly would the the usecase where you hit the limitation?Like you've noticed in previous comment traffic is assigned from inside the VM, this tree simply represents scheduling algorithm in the HW i.e how fast the card will drain from each queue. So if you have a queue carrying real-time data, and the rest carrying bulk, you might want to prioritze real-time data it i.e put it on a completely different branch on the scheduling tree.
Yep, so, if you forget about how this is implemented in HW/FW, this is the VM-side config, correct?
BR, Michałquoted
quoted
quoted
In my mind this is a device-wide configuration, since the ice driver registers each port as a separate pci device. And each of this devices have their own hardware Tx Scheduler tree global to that port. Queues that we're discussing are actually hardware queues, and are identified by hardware assigned txq_id.In general, hardware being a single unit at the device level does not necessarily mean its configuration should be device-wide. For instance, in many NICs each port has a single hardware v-switch, but we do not have some kind of "devlink filter" API to program it directly. Instead we attach TC rules to _many_ netdevs, and driver code transforms and combines these to program the unitary device. "device-wide configuration" originally meant things like firmware version or operating mode (legacy vs. switchdev) that do not relate directly to netdevs. But I agree with you that your approach is the "least evil method"; if properly explained and documented then I don't have any remaining objection to your patch, despite that I'm continuing to take the opportunity to proselytise for "reprs >> devlink" ;) -ed