Re: [net-next 10/16] net/mlx5: Support PCIe buffer congestion handling via Devlink
From: Moshe Shemesh <hidden>
Date: 2018-07-29 10:53:27
Also in:
linux-pci
On Sat, Jul 28, 2018 at 7:06 PM, Bjorn Helgaas [off-list ref] wrote:
On Thu, Jul 26, 2018 at 07:00:20AM -0700, Alexander Duyck wrote:quoted
On Thu, Jul 26, 2018 at 12:14 AM, Jiri Pirko [off-list ref] wrote:quoted
Thu, Jul 26, 2018 at 02:43:59AM CEST, jakub.kicinski@netronome.comwrote:quoted
quoted
quoted
On Wed, 25 Jul 2018 08:23:26 -0700, Alexander Duyck wrote:quoted
On Wed, Jul 25, 2018 at 5:31 AM, Eran Ben Elisha wrote:quoted
On 7/24/2018 10:51 PM, Jakub Kicinski wrote:quoted
quoted
quoted
The devlink params haven't been upstream even for a full cycleandquoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
already you guys are starting to use them to configure standard features like queuing.We developed the devlink params in order to support non-standard configuration only. And for non-standard, there are generic andvendorquoted
quoted
quoted
quoted
quoted
quoted
quoted
specific options.I thought it was developed for performing non-standard andpossiblyquoted
quoted
quoted
quoted
quoted
quoted
vendor specific configuration. Look at DEVLINK_PARAM_GENERIC_*forquoted
quoted
quoted
quoted
quoted
quoted
examples of well justified generic options for which we have no other API. The vendor mlx4 options look fairly vendor specificif youquoted
quoted
quoted
quoted
quoted
quoted
ask me, too. Configuring queuing has an API. The question is it acceptable toenterquoted
quoted
quoted
quoted
quoted
quoted
into the risky territory of controlling offloads via devlinkparametersquoted
quoted
quoted
quoted
quoted
quoted
or would we rather make vendors take the time and effort to model things to (a subset) of existing APIs. The HW never fits the APIs perfectly.I understand what you meant here, I would like to highlight thatthisquoted
quoted
quoted
quoted
quoted
mechanism was not meant to handle SRIOV, Representors, etc. The vendor specific configuration suggested here is to handle acongestionquoted
quoted
quoted
quoted
quoted
state in Multi Host environment (which includes PF and multipleVFs perquoted
quoted
quoted
quoted
quoted
host), where one host is not aware to the other hosts, and each isrunningquoted
quoted
quoted
quoted
quoted
on its own pci/driver. It is a device working mode configuration. This couldn't fit into any existing API, thus creating thisvendor specificquoted
quoted
quoted
quoted
quoted
unique API is needed.If we are just going to start creating devlink interfaces in foreveryquoted
quoted
quoted
quoted
one-off option a device wants to add why did we even bother with trying to prevent drivers from using sysfs? This just feels like we are back to the same arguments we had back in the day with it. I feel like the bigger question here is if devlink is how we aregoingquoted
quoted
quoted
quoted
to deal with all PCIe related features going forward, or should we start looking at creating a new interface/tool for PCI/PCIe related features? My concern is that we have already had features such as DMA Coalescing that didn't really fit into anything and now we are starting to see other things related to DMA and PCIe bus credits. I'm wondering if we shouldn't start looking at a tool/interface to configure all the PCIe related features such as interrupts, error reporting, DMA configuration, power management, etc. Maybe we could even look at sharing it across subsystems and include things like storage, graphics, and other subsystems in the conversation.Agreed, for actual PCIe configuration (i.e. not ECN marking) we do need to build up an API. Sharing it across subsystems would be very cool!I read the thread (starting at [1], for anybody else coming in late) and I see this has something to do with "configuring outbound PCIe buffers", but I haven't seen the connection to PCIe protocol or features, i.e., I can't connect this to anything in the PCIe spec. Can somebody help me understand how the PCI core is relevant? If there's some connection with a feature defined by PCIe, or if it affects the PCIe transaction protocol somehow, I'm definitely interested in this. But if this only affects the data transferred over PCIe, i.e., the data payloads of PCIe TLP packets, then I'm not sure why the PCI core should care.
As you wrote, this is not a PCIe feature or affects the PCIe transaction protocol. Actually, due to hardware limitation in current device, we have enabled a workaround in hardware. This mode is proprietary and not relevant to other PCIe devices, thus is set using driver-specific parameter in devlink
quoted
I wonder howcome there isn't such API in place already. Or is it?quoted
If it is not, do you have any idea how should it look like? Should itbequoted
quoted
an extension of the existing PCI uapi or something completely new? It would be probably good to loop some PCI people in...The closest thing I can think of in terms of answering your questions as to why we haven't seen anything like that would be setpci. Basically with that tool you can go through the PCI configuration space and update any piece you want. The problem is it can have effects on the driver and I don't recall there ever being any sort of notification mechanism added to make a driver aware of configuration updates.setpci is a development and debugging tool, not something we should use as the standard way of configuring things. Use of setpci should probably taint the kernel because the PCI core configures features like MPS, ASPM, AER, etc., based on the assumption that nobody else is changing things in PCI config space.quoted
As far as the interface I don't know if we would want to use something like netlink or look at something completely new. I've gone ahead and added the linux-pci mailing list to the thread. - Alex[1] https://lkml.kernel.org/r/20180719010107.22363-11-saeedm@mellanox.com