Re: [PATCH net-next v10 10/10] ice: add documentation for devlink-rate implementation
From: Wilczynski, Michal <hidden>
Date: 2022-11-09 18:55:07
On 11/8/2022 11:39 PM, Jakub Kicinski wrote:
On Mon, 7 Nov 2022 19:13:26 +0100 Michal Wilczynski wrote:quoted
Add documentation to a newly added devlink-rate feature. Provide some examples on how to use the features, which netlink attributes are supported and descriptions of the attributes. +Devlink Rate +========== + +The ``ice`` driver implements devlink-rate API. It allows for offload of +the Hierarchical QoS to the hardware. It enables user to group Virtual +Functions in a tree structure and assign supported parameters: tx_share, +tx_max, tx_priority and tx_weight to each node in a tree. So effectively +user gains an ability to control how much bandwidth is allocated for each +VF group. This is later enforced by the HW. + +It is assumed that this feature is mutually exclusive with DCB and ADQ, or +any driver feature that would trigger changes in QoS, for example creation +of the new traffic class.Meaning? Will the devlink API no longer reflect reality once one of the VFs enables DCB for example?
By DCB I mean the DCB that's implemented in the FW, and I'm not aware of any flow that would enable the VF to tweak FW DCB on/off. Additionally there is a commit in this patch series that should prevent any devlink-rate changes if the FW DCB is enabled, and should prevent enabling FW DCB enablement if any changes were made with the devlink-rate. I don't think there is a way to detect that the SW DCB is enabled though. In that case the software would try to enforce it's own settings in the SW stack and the HW would try to enforce devlink-rate settings.
quoted
This feature is also dependent on switchdev +being enabled in the system. It's required bacause devlink-rate requires +devlink-port objects to be present, and those objects are only created +in switchdev mode. + +If the driver is set to the switchdev mode, it will export +internal hierarchy the moment the VF's are created. Root of the tree +is always represented by the node_0. This node can't be deleted by the user. +Leaf nodes and nodes with children also can't be deleted. + +.. list-table:: Attributes supported + :widths: 15 85 + + * - Name + - Description + * - ``tx_max`` + - This attribute allows for specifying a maximum bandwidth to beDrop the "This attribute allows for specifying a" from all attrs.
Sure.
quoted
+ consumed by the tree Node. Rate Limit is an absolute number + specifying a maximum amount of bytes a Node may consume during + the course of one second. Rate limit guarantees that a link will + not oversaturate the receiver on the remote end and also enforces + an SLA between the subscriber and network provider. + * - ``tx_share``Wouldn't it be more common to call this tx_min, like in the old VF API and the cgroup APIs?
I agree on this one, I'm not really sure why this attribute is called tx_share. In it's iproute documentation tx_share is described as: "specifies minimal tx rate value shared among all rate objects. If rate object is a part of some rate group, then this value shared with rate objects of this rate group.". So tx_min is more intuitive, but I suspect that the original author wanted to emphasize that this BW is shared among all the children nodes.
quoted
+ - This attribute allows for specifying a minimum bandwidth allocated + to a tree node when it is not blocked. It specifies an absolute + BW. While tx_max defines the maximum bandwidth the node may consume, + the tx_share marks committed BW for the Node. + * - ``tx_priority`` + - This attribute allows for usage of strict priority arbiter among + siblings. This arbitration scheme attempts to schedule nodes based + on their priority as long as the nodes remain within their + bandwidth limit. Range 0-7.Nodes meaning it will (W)RR across all nodes of highest prio?
Yes nodes with the same priority will be treated equally.
Is prio 0 or 7 highest?
7 is the highest, nodes with 7 are selected first.
quoted
+ * - ``tx_weight`` + - This attribute allows for usage of Weighted Fair Queuing + arbitration scheme among siblings. This arbitration scheme can be + used simultaneously with the strict priority. Range 1-200.Would be good to specify how the interaction with SP looks.
In each arbitration flow, the winning node will be selected from the group’s non-blocked siblings with the highest priority. The basic rule is that each sibling group consists of a sub group with high priority nodes, a WFQ sub group with intermediate priority nodes, and a sub group with low priority nodes. So basically if several sibling nodes have same priority configured they are treated as a sub-group and arbitration among them is performed using weights. Will add this info in the documentation.
Does the absolute value of the weight matter or only the relative values? (IOW is 1 vs 10 the same as 10 vs 100)
Only relative values matter. Will also add this to documentation I guess.