Re: [PATCH 02/16] PCI/P2PDMA: Avoid pci_get_slot() which sleeps
From: John Hubbard <jhubbard@nvidia.com>
Date: 2021-05-03 18:20:37
Also in:
linux-block, linux-iommu, linux-nvme, linux-pci, lkml
On 5/3/21 9:08 AM, Logan Gunthorpe wrote: ...
quoted
By the way, pre-existing code comment: pci_p2pdma_whitelist[] seems really short. From a naive point of view, I'd expect that there must be a lot more CPUs/chipsets that can do pci p2p, what do you think? I wonder if we have to be so super strict, anyway. It just seems extremely limited, and I suspect there will be some additions to the list as soon as we start to use this.Yes, well unfortunately we have no other way to determine what host bridges can communicate with P2P. We settled on a whitelist when the series was first patch. Nobody likes that situation, but nobody has found anything better. We've been hoping standards bodies would give us a flag but I haven't heard anything about that. At least AMD has been able to guarantee us that all CPUs newer than Zen will support so that covers a large swath. It would be nice if we could say something similar for Intel.
Thanks for explaining the situation!
quoted
OK, yes this avoids taking the pci_bus_sem, but it's kind of cheating. Why is it OK to avoid taking any locks in order to retrieve the first entry from the list, but in order to retrieve any other entry, you have to aquire the pci_bus_sem, and get a reference as well? Something is inconsistent there. The new version here also no longer takes a reference on the device, which is also cheating. But I'm guessing that the unstated assumption here is that there is always at least one entry in the list. But if that's true, then it's better to show clearly that assumption, instead of hiding it in an implicit call that skips both locking and reference counting.Because we hold a reference to a child device of the bus. So the host bus device can't go away until the child device has been released. An earlier version of the P2PDMA patchset had a lot more extraneous get device calls until someone else pointed this out.quoted
You could add a new function, which is a cut-down version of pci_get_slot(), like this, and call this from __host_bridge_whitelist(): /* * A special purpose variant of pci_get_slot() that doesn't take the pci_bus_sem * lock, and only looks for the 00.0 bus-device-function. Once the PCI bus is * up, it is safe to call this, because there will always be a top-level PCI * root device. * * Other assumptions: the root device is the first device in the list, and the * root device is numbered 00.0. */ struct pci_dev *pci_get_root_slot(struct pci_bus *bus) { struct pci_dev *root; unsigned devfn = PCI_DEVFN(0, 0); root = list_first_entry_or_null(&bus->devices, struct pci_dev, bus_list); if (root->devfn == devfn) goto out; root = NULL; out: pci_dev_get(root); return root; } EXPORT_SYMBOL(pci_get_root_slot); ...I think that's a lot clearer to the reader, about what's going on here.Per above, I think the reference count is unnecessary. But I could wrap it in a static function for clarity. (There's no reason to export this function).
Yes, please. thanks, -- John Hubbard NVIDIA