Re: [PATCH mlx5-next v7 0/4] Dynamically assign MSI-X vectors count
From: Keith Busch <kbusch@kernel.org>
Date: 2021-03-30 18:11:34
Also in:
linux-pci, linux-rdma
On Mon, Mar 29, 2021 at 08:29:49PM -0500, Bjorn Helgaas wrote:
On Fri, Mar 26, 2021 at 04:01:48PM -0300, Jason Gunthorpe wrote:quoted
On Fri, Mar 26, 2021 at 11:50:44AM -0700, Alexander Duyck wrote:quoted
My concern would be that we are defining the user space interface. Once we have this working as a single operation I could see us having to support it that way going forward as somebody will script something not expecting an "offline" sysfs file, and the complaint would be that we are breaking userspace if we require the use of an "offline" file.Well, we wouldn't do that. The semantic we define here is that the msix_count interface 'auto-offlines' if that is what is required. If we add some formal offline someday then 'auto-offline' would be a NOP when the device is offline and do the same online/offline sequence as today if it isn't.Alexander, Keith, any more thoughts on this? I think I misunderstood Greg's subdirectory comment. We already have directories like this: /sys/bus/pci/devices/0000:01:00.0/link/ /sys/bus/pci/devices/0000:01:00.0/msi_irqs/ /sys/bus/pci/devices/0000:01:00.0/power/ and aspm_ctrl_attr_group (for "link") is nicely done with static attributes. So I think we could do something like this: /sys/bus/pci/devices/0000:01:00.0/ # PF directory sriov/ # SR-IOV related stuff vf_total_msix vf_msix_count_BB:DD.F # includes bus/dev/fn of first VF ... vf_msix_count_BB:DD.F # includes bus/dev/fn of last VF And I think this could support the mlx5 model as well as the NVMe model. For NVMe, a write to vf_msix_count_* would have to auto-offline the VF before asking the PF to assign the vectors, as Jason suggests above. Before VF Enable is set, the vf_msix_count_* files wouldn't exist and we wouldn't be able to assign vectors to VFs; IIUC that's a difference from the NVMe interface, but maybe not a terrible one?
Yes, that's fine, nvme can handle this flow. It is a little easier to avoid nvme user error if we could mainpulate the counts prior to VF Enable, but it's really not a problem this way either. I think it's reasonable for nvme to subscribe to this interface, but I will have to defer to someone with capable nvme devices to implement it.