Re: [RFC PATCH kernel 5/5] vfio_pci: Add NVIDIA GV100GL [Tesla V100 SXM2]... | linuxppc-dev

Re: [RFC PATCH kernel 5/5] vfio_pci: Add NVIDIA GV100GL [Tesla V100 SXM2] [10de:1db1] subdriver

From: Alexey Kardashevskiy <hidden>
Date: 2018-06-08 03:52:13
Also in: kvm

On 8/6/18 1:35 pm, Alex Williamson wrote:

On Fri, 8 Jun 2018 13:09:13 +1000
Alexey Kardashevskiy [off-list ref] wrote:

quoted

On 8/6/18 3:04 am, Alex Williamson wrote:

quoted

On Thu,  7 Jun 2018 18:44:20 +1000
Alexey Kardashevskiy [off-list ref] wrote:

quoted

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 7bddf1e..38c9475 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c

@@ -306,6 +306,15 @@ static int vfio_pci_enable(struct vfio_pci_device *vdev)
 		}
 	}
 
+	if (pdev->vendor == PCI_VENDOR_ID_NVIDIA &&
+	    pdev->device == 0x1db1 &&
+	    IS_ENABLED(CONFIG_VFIO_PCI_NVLINK2)) {

Can't we do better than check this based on device ID?  Perhaps PCIe
capability hints at this?

A normal PCI pluggable device looks like this:

root@fstn3:~# sudo lspci -vs 0000:03:00.0
0000:03:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)
	Subsystem: NVIDIA Corporation GK210GL [Tesla K80]
	Flags: fast devsel, IRQ 497
	Memory at 3fe000000000 (32-bit, non-prefetchable) [disabled] [size=16M]
	Memory at 200000000000 (64-bit, prefetchable) [disabled] [size=16G]
	Memory at 200400000000 (64-bit, prefetchable) [disabled] [size=32M]
	Capabilities: [60] Power Management version 3
	Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
	Capabilities: [78] Express Endpoint, MSI 00
	Capabilities: [100] Virtual Channel
	Capabilities: [128] Power Budgeting <?>
	Capabilities: [420] Advanced Error Reporting
	Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
	Capabilities: [900] #19


This is a NVLink v1 machine:

aik@garrison1:~$ sudo lspci -vs 000a:01:00.0
000a:01:00.0 3D controller: NVIDIA Corporation Device 15fe (rev a1)
	Subsystem: NVIDIA Corporation Device 116b
	Flags: bus master, fast devsel, latency 0, IRQ 457
	Memory at 3fe300000000 (32-bit, non-prefetchable) [size=16M]
	Memory at 260000000000 (64-bit, prefetchable) [size=16G]
	Memory at 260400000000 (64-bit, prefetchable) [size=32M]
	Capabilities: [60] Power Management version 3
	Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
	Capabilities: [78] Express Endpoint, MSI 00
	Capabilities: [100] Virtual Channel
	Capabilities: [250] Latency Tolerance Reporting
	Capabilities: [258] L1 PM Substates
	Capabilities: [128] Power Budgeting <?>
	Capabilities: [420] Advanced Error Reporting
	Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
	Capabilities: [900] #19
	Kernel driver in use: nvidia
	Kernel modules: nvidiafb, nouveau, nvidia_384_drm, nvidia_384


This is the one the patch is for:

[aik@yc02goos ~]$ sudo lspci -vs 0035:03:00.0
0035:03:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2]
(rev a1)
	Subsystem: NVIDIA Corporation Device 1212
	Flags: fast devsel, IRQ 82, NUMA node 8
	Memory at 620c280000000 (32-bit, non-prefetchable) [disabled] [size=16M]
	Memory at 6228000000000 (64-bit, prefetchable) [disabled] [size=16G]
	Memory at 6228400000000 (64-bit, prefetchable) [disabled] [size=32M]
	Capabilities: [60] Power Management version 3
	Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
	Capabilities: [78] Express Endpoint, MSI 00
	Capabilities: [100] Virtual Channel
	Capabilities: [250] Latency Tolerance Reporting
	Capabilities: [258] L1 PM Substates
	Capabilities: [128] Power Budgeting <?>
	Capabilities: [420] Advanced Error Reporting
	Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
	Capabilities: [900] #19
	Capabilities: [ac0] #23
	Kernel driver in use: vfio-pci


I can only see a new capability #23 which I have no idea about what it
actually does - my latest PCIe spec is
PCI_Express_Base_r3.1a_December7-2015.pdf and that only knows capabilities
till #21, do you have any better spec? Does not seem promising anyway...

You could just look in include/uapi/linux/pci_regs.h and see that 23
(0x17) is a TPH Requester capability and google for that...  It's a TLP
processing hint related to cache processing for requests from system
specific interconnects.  Sounds rather promising.  Of course there's
also the vendor specific capability that might be probed if NVIDIA will
tell you what to look for and the init function you've implemented
looks for specific devicetree nodes, that I imagine you could test for
in a probe as well.


This 23 is in hex:

[aik@yc02goos ~]$ sudo lspci -vs 0035:03:00.0
0035:03:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2]
(rev a1)
	Subsystem: NVIDIA Corporation Device 1212
	Flags: fast devsel, IRQ 82, NUMA node 8
	Memory at 620c280000000 (32-bit, non-prefetchable) [disabled] [size=16M]
	Memory at 6228000000000 (64-bit, prefetchable) [disabled] [size=16G]
	Memory at 6228400000000 (64-bit, prefetchable) [disabled] [size=32M]
	Capabilities: [60] Power Management version 3
	Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
	Capabilities: [78] Express Endpoint, MSI 00
	Capabilities: [100] Virtual Channel
	Capabilities: [250] Latency Tolerance Reporting
	Capabilities: [258] L1 PM Substates
	Capabilities: [128] Power Budgeting <?>
	Capabilities: [420] Advanced Error Reporting
	Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
	Capabilities: [900] #19
	Capabilities: [ac0] #23
	Kernel driver in use: vfio-pci

[aik@yc02goos ~]$ sudo lspci -vvvxxxxs 0035:03:00.0 | grep ac0
	Capabilities: [ac0 v1] #23
ac0: 23 00 01 00 de 10 c1 00 01 00 10 00 00 00 00 00


Talking to NVIDIA is always an option :)

quoted

Is it worthwhile to continue with assigning the device in the !ENABLED
case?  For instance, maybe it would be better to provide a weak
definition of vfio_pci_nvlink2_init() that would cause us to fail here
if we don't have this device specific support enabled.  I realize
you're following the example set forth for IGD, but those regions are
optional, for better or worse.


The device is supposed to work even without GPU RAM passed through, this
should look like NVLink v1 in this case (there used to be bugs in the
driver, may be still are, have not checked for a while but there is a bug
opened at NVIDIA about this and they were going to fix that), this is why I
chose not to fail here.

Ok.

quoted

diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
index 24ee260..2725bc8 100644
--- a/drivers/vfio/pci/Kconfig
+++ b/drivers/vfio/pci/Kconfig

@@ -30,3 +30,7 @@ config VFIO_PCI_INTX
 config VFIO_PCI_IGD
 	depends on VFIO_PCI
 	def_bool y if X86
+
+config VFIO_PCI_NVLINK2
+	depends on VFIO_PCI
+	def_bool y if PPC_POWERNV

As written, this also depends on PPC_POWERNV (or at least TCE), it's not
a portable implementation that we could re-use on X86 or ARM or any
other platform if hardware appeared for it.  Can we improve that as
well to make this less POWER specific?  Thanks,


As I said in another mail, every P9 chip in that box has some NVLink2 logic
on it so it is not even common among P9's in general and I am having hard
time seeing these V100s used elsewhere in such way.

https://www.redhat.com/archives/vfio-users/2018-May/msg00000.html

Not much platform info, but based on the rpm mentioned, looks like an
x86_64 box.  Thanks,

Wow. Interesting. Thanks for the pointer. No advertising material actually
says that it is P9 only or even mention P9, wiki does not say it is P9 only
either. Hmmm...



-- 
Alexey

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help