Re: [PATCH v4 1/3] vfio: add dma-buf get_tph callback and DMA_BUF_TPH feature
From: Zhiping Zhang <hidden>
Date: 2026-05-23 01:04:13
Also in:
dri-devel, kvm, linux-pci, linux-rdma
On Thu, May 21, 2026 at 3:24 PM Alex Williamson [off-list ref] wrote:
quoted
On Thu, 21 May 2026 16:04:12 -0600 Alex Williamson [off-list ref] wrote:quoted
On Tue, 19 May 2026 13:13:49 -0700 Zhiping Zhang [off-list ref] wrote:quoted
Add a dma-buf get_tph callback for exporters to return TPH (TLP Processing Hints) metadata, and add VFIO_DEVICE_FEATURE_DMA_BUF_TPH so userspace can attach that metadata to a VFIO-exported dma-buf.This should be two patches, the first extending the dma-buf framework for the get_tph callback for explicit approval from dma-buf maintainers (who are not even copied here). The second the vfio-pci implementation of get_tph.quoted
8-bit ST and 16-bit Extended ST are distinct PCIe TPH namespaces; the uAPI carries both with explicit validity flags so importers get the value matching their requested width. SET is write-once per dma-buf; the existing VFIO_DEVICE_FEATURE_DMA_BUF uAPI is unchanged.I didn't see what motivated this write-once change, I thought we understood that it was a userspace problem that the tph values need to be set before providing the dma-buf fd to the importer and that races relative to that are a userspace ordering problem. Write-once seems unnecessarily restrictive and there's no justification provided here.quoted
Signed-off-by: Zhiping Zhang <redacted> --- drivers/vfio/pci/vfio_pci_core.c | 3 + drivers/vfio/pci/vfio_pci_dmabuf.c | 134 +++++++++++++++++++++++++++-- drivers/vfio/pci/vfio_pci_priv.h | 12 +++ include/linux/dma-buf.h | 21 +++++ include/uapi/linux/vfio.h | 35 ++++++++ 5 files changed, 198 insertions(+), 7 deletions(-)diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c index 3f8d093aacf8..94aa6dd95701 100644 --- a/drivers/vfio/pci/vfio_pci_core.c +++ b/drivers/vfio/pci/vfio_pci_core.c@@ -1534,6 +1534,9 @@ int vfio_pci_core_ioctl_feature(struct vfio_device *device, u32 flags, return vfio_pci_core_feature_token(vdev, flags, arg, argsz); case VFIO_DEVICE_FEATURE_DMA_BUF: return vfio_pci_core_feature_dma_buf(vdev, flags, arg, argsz); + case VFIO_DEVICE_FEATURE_DMA_BUF_TPH: + return vfio_pci_core_feature_dma_buf_tph(vdev, flags, arg, + argsz); default: return -ENOTTY; }diff --git a/drivers/vfio/pci/vfio_pci_dmabuf.c b/drivers/vfio/pci/vfio_pci_dmabuf.c index f87fd32e4a01..be1c65385670 100644 --- a/drivers/vfio/pci/vfio_pci_dmabuf.c +++ b/drivers/vfio/pci/vfio_pci_dmabuf.c@@ -19,7 +19,24 @@ struct vfio_pci_dma_buf { u32 nr_ranges; struct kref kref; struct completion comp; - u8 revoked : 1; + /* + * TPH metadata published by VFIO_DEVICE_FEATURE_DMA_BUF_TPH and + * consumed by the @get_tph dma-buf callback. + * + * @tph_flags is the publish/consume gate: writers populate + * @steering_tag, @steering_tag_ext and @ph first, then store + * @tph_flags with smp_store_release(); readers do + * smp_load_acquire(&tph_flags) before accessing the value fields. + * @tph_flags == 0 means "TPH not set". Writers publish a non-zero + * value only once per dma-buf and serialize via vdev->memory_lock; + * readers stay lockless to avoid AB-BA against the dma_resv_lock held + * by importers. + */Can you outline the ABBA hazard, I'm not seeing it. You're acquiring memory_lock in the feature SET and dma_resv_lock doesn't appear to be held when calling .get_tph(). There's a lot of lockless complication here balanced on this claim of avoiding a hazard that doesn't appear present.quoted
+ u32 tph_flags; + u16 steering_tag_ext; + u8 steering_tag; + u8 ph; + bool revoked;If we still used memory_lock for tph, these could be: u8 tph_st_valid:1; /* memory_lock */ u8 tph_st_ext_valid:1; /* memory_lock */ u8 tph_ph:2; /* memory_lock */ u8 tph_st; u16 tph_st_ext; u8 revoked:1; /* dma_resv_lock */ The existing change of @revoked from bitfield to bool has no rationale noted for it in the commit log.On second thought, what dependency does anything here have on memory_lock? I think we're jumping through hoops to avoid a lock we don't even need. If we just want to serialize SET vs get_tph we could have a mutex on the dma-buf structure, or use RCU if we want to manage it locklessly and make sure get_tph always sees a fully consistent set of values. Thanks, Alex
Agreed, we don't need memory_lock in this path. For v5 I'll instead add a struct mutex lock to struct vfio_pci_dma_buf and take it in SET, get_tph, and around the priv->vdev = NULL store in cleanup. Thanks, Zhiping