Re: [PATCH v4 0/5] lib: add Port Representors
From: Thomas Monjalon <hidden>
Date: 2018-01-15 16:09:59
15/01/2018 13:12, Doherty, Declan:
On 10/01/2018 7:26 PM, Thomas Monjalon wrote:quoted
10/01/2018 14:46, Doherty, Declan:quoted
On 09/01/2018 11:22 PM, Thomas Monjalon wrote:quoted
Hi, 08/01/2018 15:37, Remy Horton:quoted
Port Representors provide a logical presentation in DPDK of VF (virtual function) ports for the purposes of control and monitoring. Each port representor device represents a single VF and is associated with it's parent physical function (PF) PMD which provides the back-end hooks for the representor device ops and defines the control domain to which that port belongs. This allows to use existing DPDK APIs to monitor and control the port without the need to create and maintain VF specific APIs.Extending control plane ability of DPDK has been discussed multiple times.It has, and I have yet to see a really strong reason as to why we would not support control plane functions within DPDK, many of which are already support today implicitly anyway through our ethdev APIs.quoted
The current agreed policy is: " The primary goal of DPDK is to provide a userspace dataplane. Managing VFs from a PF driver is a control plane feature and developers should generally rely on the Linux Kernel for that. " http://dpdk.org/doc/guides/contributing/design.html#pf-and-vf-considerationsMy understanding is that this particular entry was based around the discussion on the divergence of functionality between the Linux kernel PF driver and the DPDK PF driver. I also don't really think the above statement is valid as a blanket statement for the project as it makes the assumption that DPDK is only deployed on Linux hosts, what about FreeBSD? or in the future Windows?Yes, we must agree on removing this scope limitation while working on a generic VF representor.quoted
A number of presentations at both Userspace in Dublin and the Summit in San Jose discussed the support of control plane functionality by DPDK and there wasn't any strong arguments or opposition against using DPDK for control plane functions that I saw. In any case this patchset is not introducing any new control plane APIs that don't already exist within DPDK today, it only enables the creation of a new type of virtual PMDs which are linked to the same base infrastructure and which can be used to represent VFs in a control plane application as we have implemented in this patch set.quoted
If we relax this policy, I think the representor solution should be a real port, not only "for the purposes of control and monitoring". It has been asked several times as replies to this series, but it is kindly ignored, saying it will be thought later.I think we have stated in multiple discussions, especially during the userspace presentation back in September that this solution supports data path on the representors PMDs, and we have used the infrastructure proposed here to do exactly what you are asking. As the representor infrastructure doesn't preclude the support of a data path, we have used it as it is presented here to implement a data path for exception path packets for a prototype vswitch offload implementation.quoted
I don't see a general agreement on this series so far.I think the main issue of contention is that there is a misunderstanding that this implementation only supports control plane management and monitoring, but that is not the case and it can be used for full data path representors, with limited or no control plane functionality if required, at the end of the day the only limitations are based on what is implemented by the backend base driver were the broker is running for the representor ports.The misunderstanding may originates from what you describe (even in v5): "ports for the purposes of control and monitoring"noted, but that is the scope of what we demostrate in the patchset, but we'll update the introduction to reflect the fact that they can be used to also support data path functions, such as exception path traffic for hw switch.quoted
I think everybody agree to have VF representors in DPDK. But there are few things which are not generic enough, and not easy to use. I hoped the discussion started at Dublin would continue on the mailing list but I realize the joint effort with other vendors did not happen. I will elaborate quickly below and more detailed in later review. 1/ In order to keep track of the relations between PF, VF and representor - which are all ethdev - you create a struct outside of ethdev. I think it should be managed inside ethdev API.Initially we had implemented the representor functionality within the context of the ethdev library but ran into a number of scenarios where this didn't work well as it makes the assumption that the base device that the representors are attached to is always an ethdev, we ran into cases were the PF isn't necessarily an ethdev, for example in some smartNICs the PF would be better represented by a switchdev, or it is possible that the device hosting the representor broker could just provide a conduit to a kernel driver.
The base device may be something else than an ethdev PF, so it may be represented as a rte_device. But the VF and representor are still ethdev. I still think the relationship should be described in ethdev.
quoted
As suggested by others, we could also think whether a switchdev API is interesting or not.Indeed if a switchdev is something that is required by the community it would make sense that the representor infrastructure was initialized within the switchdev and not an ethdev. The advantage of keeping the representor infrastructure independent is that it gives the flexibility for representors to be supported independently of device type they are attached to.
switchdev is just another device class. We must abstract the base device as a basic EAL rte_device for now.
quoted
2/ You create a new library for ethdev device management. It is the responsibility of the bus to scan and probe devices. In Intel case, the representor has no real bus so it must rely on the vdev bus. Note that the new custom scan hook may help.This isn't the case in latest versions of the patchset, the bus the representors are dependent on is that of the base device, so for the i40e it's the PF PCI device.
I'm suggesting to use vdev as bus of i40e representor, because there is no PCI identifier for them, right?
quoted
In Mellanox case, the representor already exists in Linux, and is based on PCI bus. Then, as any other port, it must be managed with whitelist or blacklist.I think the suggestion by Yuanhan of using the device whitelist command option makes sense as a option from the commandline, but it would require the newly propose implementation which allows specification of both the bus and device as not all devices are PCI, which have multiple host ports using SR-IOV, but there are cases when an dynamic creation/destruction of ports may also need to be supported, which is what the representor APIs support.
The full proposed syntax of device identification is: http://dpdk.org/ml/archives/dev/2017-December/084572.html With this generic syntax, we can describe port representors and request their initialization via whitelisting. [...]
quoted
3/ You are using PCI address + index to identify the representor. It is a no-go. We have made effort to abstract buses. As an idea, the identification of a representor could use the new proposed flexible device syntax.We are currently using net_representor_%bus%_%device_id%_%vport_id% to identify each representor device but I have no issue changing to either the current convention which would be net_representor_%unique_id% or if I understand the proposal in the RFC "ether: standardize getting the port by name" we would be using something like, we should be looking at something along the lines of net_%bus%_%device_id%_%port_id% which is pretty close to what we are using now.
It is introducing yet another syntax. It would be better to align every device identification usages on a common and standard syntax. Then the syntax parsing will be done in bus and libs (e.g. ethdev) with some helper functions which can be re-used for every usages. The latest version of representors is using direct PCI parsing instead of relying on some bus or ethdev helpers.
In terms of that RFC I'm not clear on if the proposal is just to affect the API for getting a port by name, or actually the name name assigned to the device itself.
It is not about the name. It is a proposal of syntax to describe a resource of a hardware device, or a virtual device. The most obvious usage is to replace -w/-b and --vdev. It can also be used in OVS or any other DPDK configuration.
quoted
4/ Such new API must be experimental.We will address this in the next revisionquoted
I propose to better think the representor API inside ethdev with a good multi-vendor collaboration, and submit a deprecation notice for 18.05 integration.I would really like to see this included as experimental in 18.02 release, if it is agreed by the community that we need to re-integate the representor concept into librte_ethdev during for 18.05 we will support that work.
I don't see the benefit of introducing a lib and a syntax which would be replaced in the next release.