Thread (28 messages) 28 messages, 2 authors, 9d ago

Re: [PATCH v19 net-next 1/9] octeontx2-af: Enforce single RVU AF probe

From: Ratheesh Kannoth <hidden>
Date: 2026-06-08 02:25:57
Also in: lkml

On 2026-06-05 at 12:02:37, Ratheesh Kannoth (rkannoth@marvell.com) wrote:
There is only one admin-function PCI device per system.
Reject any additional AF probe with -EBUSY so the driver model matches
hardware and automated reviewers can rely on a single bound instance.

Signed-off-by: Ratheesh Kannoth <redacted>
quoted
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c
@@ -3542,12 +3542,19 @@ static void rvu_update_module_params(struct rvu *rvu)
 		kpu_profile ? kpu_profile : default_pfl_name, KPU_NAME_LEN);
 }

+static atomic_t device_bound = ATOMIC_INIT(0);
+
 static int rvu_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 {
 	struct device *dev = &pdev->dev;
 	struct rvu *rvu;
 	int    err;

+	if (atomic_cmpxchg(&device_bound, 0, 1) != 0) {
+		dev_warn(dev, "Only one af device is supported.\n");
+		return -EBUSY;
+	}
+
Once this atomic_cmpxchg() succeeds, where is device_bound ever reset
back to 0?
The flag is set on entry but the function does not clear it on any of
the err_* failure labels, and rvu_remove() also does not touch it.
Two scenarios appear to leave the driver permanently unusable until
module unload:
1) Any transient probe failure after the cmpxchg (devm_kzalloc returning
  NULL, pci_enable_device(), dma_set_mask_and_coherent(), pcim_iomap(),
  rvu_setup_hw_resources(), rvu_mbox_init(), rvu_register_interrupts(),
  rvu_register_dl(), rvu_enable_sriov(), etc.) returns via an err_*
  label with device_bound still set to 1. Wouldn't every subsequent
  probe attempt — including deferred-probe retries on the same device
  — then fail with -EBUSY even though nothing is bound?
2) A successful probe followed by sysfs unbind, PCI hot-remove, or a
  driver rebind goes through rvu_remove(), which tears down the rvu
  instance but does not clear device_bound. Doesn't this break
  rebinding the same device (echo <bdf> > /sys/bus/pci/drivers/rvu_af/bind)
  and PCI error-recovery flows for the lifetime of the module load?
The commit message describes the intent as rejecting "any additional AF
probe" so there is "a single bound instance", which reads as one bound
at a time. Should device_bound be cleared (atomic_set(&device_bound, 0))
on every err_* path in rvu_probe() and at the end of rvu_remove() to
match that intent?
You are completely correct. The current implementation lacks
transactional rollback for device_bound in the downstream
rvu_probe() error paths, as well as the corresponding reset
in rvu_remove().

The inclusion of atomic_cmpxchg() here is a proactive sanity
check to enforce the hardware paradigm, as firmware instantiates
only a single Admin Function (AF) PCI device. However, full
error-handling path hardening and proper resource cleanup for
the AF driver are currently incomplete across the broader
initialization sequence.

To prevent scope creep in this series, we
plan to address the comprehensive error-path rollback—including
proper atomic_set(&device_bound, 0) invocations on probe failure
and driver detachment—in a dedicated, subsequent hardening
patchset targeted for net-next.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help