Re: [PATCH v19 net-next 1/9] octeontx2-af: Enforce single RVU AF probe
From: Ratheesh Kannoth <hidden>
Date: 2026-06-08 02:25:57
Also in:
lkml
On 2026-06-05 at 12:02:37, Ratheesh Kannoth (rkannoth@marvell.com) wrote:
There is only one admin-function PCI device per system. Reject any additional AF probe with -EBUSY so the driver model matches hardware and automated reviewers can rely on a single bound instance. Signed-off-by: Ratheesh Kannoth <redacted>
quoted
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c@@ -3542,12 +3542,19 @@ static void rvu_update_module_params(struct rvu *rvu) kpu_profile ? kpu_profile : default_pfl_name, KPU_NAME_LEN); } +static atomic_t device_bound = ATOMIC_INIT(0); + static int rvu_probe(struct pci_dev *pdev, const struct pci_device_id *id) { struct device *dev = &pdev->dev; struct rvu *rvu; int err; + if (atomic_cmpxchg(&device_bound, 0, 1) != 0) { + dev_warn(dev, "Only one af device is supported.\n"); + return -EBUSY; + } +Once this atomic_cmpxchg() succeeds, where is device_bound ever reset back to 0? The flag is set on entry but the function does not clear it on any of the err_* failure labels, and rvu_remove() also does not touch it. Two scenarios appear to leave the driver permanently unusable until module unload: 1) Any transient probe failure after the cmpxchg (devm_kzalloc returning NULL, pci_enable_device(), dma_set_mask_and_coherent(), pcim_iomap(), rvu_setup_hw_resources(), rvu_mbox_init(), rvu_register_interrupts(), rvu_register_dl(), rvu_enable_sriov(), etc.) returns via an err_* label with device_bound still set to 1. Wouldn't every subsequent probe attempt — including deferred-probe retries on the same device — then fail with -EBUSY even though nothing is bound? 2) A successful probe followed by sysfs unbind, PCI hot-remove, or a driver rebind goes through rvu_remove(), which tears down the rvu instance but does not clear device_bound. Doesn't this break rebinding the same device (echo <bdf> > /sys/bus/pci/drivers/rvu_af/bind) and PCI error-recovery flows for the lifetime of the module load? The commit message describes the intent as rejecting "any additional AF probe" so there is "a single bound instance", which reads as one bound at a time. Should device_bound be cleared (atomic_set(&device_bound, 0)) on every err_* path in rvu_probe() and at the end of rvu_remove() to match that intent?
You are completely correct. The current implementation lacks transactional rollback for device_bound in the downstream rvu_probe() error paths, as well as the corresponding reset in rvu_remove(). The inclusion of atomic_cmpxchg() here is a proactive sanity check to enforce the hardware paradigm, as firmware instantiates only a single Admin Function (AF) PCI device. However, full error-handling path hardening and proper resource cleanup for the AF driver are currently incomplete across the broader initialization sequence. To prevent scope creep in this series, we plan to address the comprehensive error-path rollback—including proper atomic_set(&device_bound, 0) invocations on probe failure and driver detachment—in a dedicated, subsequent hardening patchset targeted for net-next.