Re: [PATCH v19 net-next 5/9] octeontx2-af: npc: cn20k: add subbank search... | netdev

Re: [PATCH v19 net-next 5/9] octeontx2-af: npc: cn20k: add subbank search order control

From: Ratheesh Kannoth <hidden>
Date: 2026-06-08 02:28:36
Also in: lkml

On 2026-06-05 at 12:02:41, Ratheesh Kannoth (rkannoth@marvell.com) wrote:

CN20K NPC MCAM is split into 32 subbanks that are searched in a
predefined order during allocation. Lower-numbered subbanks have
higher priority than higher-numbered ones.

Add a runtime "srch_order" to control the order in which
subbanks are searched during MCAM allocation.

Signed-off-by: Ratheesh Kannoth <redacted>

https://netdev-ai.bots.linux.dev/sashiko/#/patchset/20260605063245.3553861-1-rkannoth%40marvell.com says

quoted

+		USED = 1,
+	};
+
+	if (cnt != npc_priv.num_subbanks) {
+		dev_err(rvu->dev, "Number of entries(%u) != %u\n",
+			cnt, npc_priv.num_subbanks);
+		return -EINVAL;
+	}
+
+	mutex_lock(&mcam->lock);
+	npc_lock_all_subbank();

Can the encoding of subbank_srch_order[] be reconciled between
initialization and this setter?
The boot-time fill done by npc_create_srch_order() and npc_subbank_init()
populates the array as subbank_srch_order[subbank_idx] = priority (i.e.
indexed by subbank, value is priority). For cnt = 8 it produces
[3, 5, 2, 6, 1, 7, 0, 4], whose actual hardware search order is
sb6, sb4, sb2, sb0, sb7, sb1, sb3, sb5.
The new setter writes the same array back as
subbank_srch_order[priority] = subbank_idx (indexed by priority, value is
subbank):
   for (int i = 0; i < cnt; i++)
       subbank_srch_order[i] = (u32)narr[i];
If a user runs devlink dev param show (which calls the GET path returning
the boot-time encoding) and then re-applies the same array via
devlink dev param set, the SET path treats it as a permutation of subbank
indices and produces search order sb3, sb5, sb2, sb6, sb1, sb7, sb0, sb4
instead of the previous order.
The validator also checks that arr[i] in 0..sz-1 occurs exactly once each,
which matches the SET form but is also true of the boot-time content
(subbank_srch_order[7] == 4 in the example), so the validator does not
catch the round-trip mismatch.
Should both writers agree on a single encoding so that show + set
round-trips are stable?

Agree, but npc_create_srch_order() and npc_subbank_init() are not added by this function
and is preexisting issue. WIll send patch to "net" tree soon after this series
is merged.

+			xa_erase(&npc_priv.xa_sb_used, prio);

quoted

+	}
+
+	for (int i = 0; i < cnt; i++)
+		subbank_srch_order[i] = (u32)narr[i];
+
+	restrict_valid = false;

Is the unconditional restrict_valid = false here intentional, and is
there a way to undo it?

No.
This is intentional.

The cleanup loop directly above does not only erase newly-added entries.
The check rsrc[FREE][prio] == 1 (and the USED counterpart) also fires for
slots whose successful iteration replaced a pre-existing entry whose
paired decrementing iteration never ran due to the failure.
For example, with sb 0 FREE arr_idx = 0 and sb 1 FREE arr_idx = 1 and
narr = [1, 0]:
 - iter prio = 0 succeeds: xa_sb_free[0] is replaced from 0 to 1,
   rsrc[FREE][0] becomes +1, rsrc[FREE][1] becomes -1.
 - iter prio = 1 fails.
 - cleanup loop sees rsrc[FREE][0] == 1 and erases xa_sb_free[0].
The rollback loop then calls xa_store(xa_sb_free, save[0] = 0, sb 0)
against an empty slot, which does need a fresh xarray node under
GFP_KERNEL. If that allocation fails, the only response is WARN(!!err, ...)
and the function returns the original error.
After such a failure, sb->arr_idx is restored but the corresponding
xa_sb_used / xa_sb_free entry is gone. Would future MCAM allocations via
npc_subbank_noref_alloc() then silently skip those subbanks? Could
xa_reserve() before the destructive phase, or not pre-erasing entries in
the cleanup loop, make rollback unconditionally safe?

Suppose there are only two subbnaks and both are free.

When first loop runs.

rsrc[FREE][0] = 1 - 1 = 0
rsrc[FREE][1] = 1 - 1 = 0

xa_erse check if any value is positive, in this case nothing is positive, nothing
will be removed. So rollback will be safe.

quoted

+
+	npc_unlock_all_subbank();
+	mutex_unlock(&mcam->lock);
+
+	return rc;
+}

Is the locking on the allocator side sufficient to make this setter
atomic with respect to MCAM allocation?

Yes. allocation is done in mbox handler; which is protected by mcam->lock.

Is the locking on the allocator side sufficient to make this setter
atomic with respect to MCAM allocation?
The setter takes mcam->lock plus all per-subbank locks and rewrites
xa_sb_used / xa_sb_free, modifies every sb->arr_idx, and toggles
restrict_valid.
The allocation hot path
rvu_mbox_handler_npc_mcam_alloc_entry()
 -> npc_mcam_alloc_entries()
   -> npc_cn20k_ref_idx_alloc()
     -> npc_subbank_noref_alloc()
iterates xa_for_each(&npc_priv.xa_sb_used, ...) and
xa_for_each(&npc_priv.xa_sb_free, ...) without mcam->lock held
(npc_mcam_alloc_entries() takes mcam->lock only after
npc_cn20k_ref_idx_alloc() returns) and without holding any per-subbank
lock between iterations (the per-subbank lock is taken inside
npc_subbank_alloc() / npc_subbank_suits() and released before the next
iteration step).
xa_for_each is RCU-safe so this is not a use-after-free, but a concurrent
SET can re-key xarray entries while the allocator is mid-walk, causing
the allocator to visit some subbanks twice and skip others entirely.
restrict_valid is also a plain bool read by npc_subbank_restrict_usage()
and npc_defrag_skip_restricted_sb() and a plain bool write by the setter,
with no READ_ONCE / WRITE_ONCE.
Should the allocator iteration also hold mcam->lock (or the relevant
subbank locks) for the duration of the walk, and should restrict_valid
use READ_ONCE / WRITE_ONCE?

mcam_allocation() is done under mutex_unlock(&mcam->lock); in mbox handler.
Mbox are handled serially by AF. So this case wont' happen.

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help