Thread (33 messages) 33 messages, 5 authors, 2022-10-11

Re: [RFC PATCH net-next v4 2/6] devlink: Extend devlink-rate api with queues and new parameters

From: Jiri Pirko <jiri@resnulli.us>
Date: 2022-09-29 07:08:30

Wed, Sep 28, 2022 at 01:53:24PM CEST, michal.wilczynski@intel.com wrote:

On 9/26/2022 1:58 PM, Jiri Pirko wrote:
quoted
Tue, Sep 20, 2022 at 01:09:04PM CEST, ecree.xilinx@gmail.com wrote:
quoted
On 19/09/2022 14:12, Wilczynski, Michal wrote:
quoted
Maybe a switchdev case would be a good parallel here. When you enable switchdev, you get port representors on
the host for each VF that is already attached to the VM. Something that gives the host power to configure
netdev that it doesn't 'own'. So it seems to me like giving user more power to configure things from the host
Well, not really. It gives the user on hypervisor possibility
to configure the eswitch vport side. The other side of the wire, which
is in VM, is autonomous.
Frankly speaking the VM is still free to assign traffic to queues as before,
I guess the networking card scheduling algorithm will just drain those
queues at different pace.
That was not my point, my point is, that with per-queue shaping, you are
basically configuring the other side of the wire (VF), when this config
is out of the domain of hypervisor.
quoted
quoted
quoted
is acceptable.
Right that's the thing though: I instinctively Want this to be done
through representors somehow, because it _looks_ like it ought to
be scoped to a single netdev; but that forces the hierarchy to
respect netdev boundaries which as we've discussed is an unwelcome
limitation.
Why exacly? Do you want to share a single queue between multiple vport?
Or what exactly would the the usecase where you hit the limitation?
Like you've noticed in previous comment traffic is assigned from inside the
VM,
this tree simply represents scheduling algorithm in the HW i.e how fast the
card
will drain from each queue. So if you have a queue carrying real-time data,
and the rest carrying bulk, you might want to prioritze real-time data
it i.e put it on a completely different branch on the scheduling tree.
Yep, so, if you forget about how this is implemented in HW/FW, this is
the VM-side config, correct?

BR,
Michał
quoted
quoted
quoted
In my mind this is a device-wide configuration, since the ice driver registers each port as a separate pci device.
And each of this devices have their own hardware Tx Scheduler tree global to that port. Queues that we're
discussing are actually hardware queues, and are identified by hardware assigned txq_id.
In general, hardware being a single unit at the device level does
not necessarily mean its configuration should be device-wide.
For instance, in many NICs each port has a single hardware v-switch,
but we do not have some kind of "devlink filter" API to program it
directly.  Instead we attach TC rules to _many_ netdevs, and driver
code transforms and combines these to program the unitary device.
"device-wide configuration" originally meant things like firmware
version or operating mode (legacy vs. switchdev) that do not relate
directly to netdevs.

But I agree with you that your approach is the "least evil method";
if properly explained and documented then I don't have any
remaining objection to your patch, despite that I'm continuing to
take the opportunity to proselytise for "reprs >> devlink" ;)

-ed
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help