Thread (28 messages) 28 messages, 2 authors, 17d ago

Re: [PATCH v19 net-next 5/9] octeontx2-af: npc: cn20k: add subbank search order control

From: Ratheesh Kannoth <hidden>
Date: 2026-06-08 02:28:36
Also in: lkml

On 2026-06-05 at 12:02:41, Ratheesh Kannoth (rkannoth@marvell.com) wrote:
CN20K NPC MCAM is split into 32 subbanks that are searched in a
predefined order during allocation. Lower-numbered subbanks have
higher priority than higher-numbered ones.

Add a runtime "srch_order" to control the order in which
subbanks are searched during MCAM allocation.

Signed-off-by: Ratheesh Kannoth <redacted>
https://netdev-ai.bots.linux.dev/sashiko/#/patchset/20260605063245.3553861-1-rkannoth%40marvell.com says
quoted
+		USED = 1,
+	};
+
+	if (cnt != npc_priv.num_subbanks) {
+		dev_err(rvu->dev, "Number of entries(%u) != %u\n",
+			cnt, npc_priv.num_subbanks);
+		return -EINVAL;
+	}
+
+	mutex_lock(&mcam->lock);
+	npc_lock_all_subbank();
Can the encoding of subbank_srch_order[] be reconciled between
initialization and this setter?
The boot-time fill done by npc_create_srch_order() and npc_subbank_init()
populates the array as subbank_srch_order[subbank_idx] = priority (i.e.
indexed by subbank, value is priority). For cnt = 8 it produces
[3, 5, 2, 6, 1, 7, 0, 4], whose actual hardware search order is
sb6, sb4, sb2, sb0, sb7, sb1, sb3, sb5.
The new setter writes the same array back as
subbank_srch_order[priority] = subbank_idx (indexed by priority, value is
subbank):
   for (int i = 0; i < cnt; i++)
       subbank_srch_order[i] = (u32)narr[i];
If a user runs devlink dev param show (which calls the GET path returning
the boot-time encoding) and then re-applies the same array via
devlink dev param set, the SET path treats it as a permutation of subbank
indices and produces search order sb3, sb5, sb2, sb6, sb1, sb7, sb0, sb4
instead of the previous order.
The validator also checks that arr[i] in 0..sz-1 occurs exactly once each,
which matches the SET form but is also true of the boot-time content
(subbank_srch_order[7] == 4 in the example), so the validator does not
catch the round-trip mismatch.
Should both writers agree on a single encoding so that show + set
round-trips are stable?
Agree, but npc_create_srch_order() and npc_subbank_init() are not added by this function
and is preexisting issue. WIll send patch to "net" tree soon after this series
is merged.
+			xa_erase(&npc_priv.xa_sb_used, prio);
quoted
+	}
+
+	for (int i = 0; i < cnt; i++)
+		subbank_srch_order[i] = (u32)narr[i];
+
+	restrict_valid = false;
Is the unconditional restrict_valid = false here intentional, and is
there a way to undo it?
No.
This is intentional.
The cleanup loop directly above does not only erase newly-added entries.
The check rsrc[FREE][prio] == 1 (and the USED counterpart) also fires for
slots whose successful iteration replaced a pre-existing entry whose
paired decrementing iteration never ran due to the failure.
For example, with sb 0 FREE arr_idx = 0 and sb 1 FREE arr_idx = 1 and
narr = [1, 0]:
 - iter prio = 0 succeeds: xa_sb_free[0] is replaced from 0 to 1,
   rsrc[FREE][0] becomes +1, rsrc[FREE][1] becomes -1.
 - iter prio = 1 fails.
 - cleanup loop sees rsrc[FREE][0] == 1 and erases xa_sb_free[0].
The rollback loop then calls xa_store(xa_sb_free, save[0] = 0, sb 0)
against an empty slot, which does need a fresh xarray node under
GFP_KERNEL. If that allocation fails, the only response is WARN(!!err, ...)
and the function returns the original error.
After such a failure, sb->arr_idx is restored but the corresponding
xa_sb_used / xa_sb_free entry is gone. Would future MCAM allocations via
npc_subbank_noref_alloc() then silently skip those subbanks? Could
xa_reserve() before the destructive phase, or not pre-erasing entries in
the cleanup loop, make rollback unconditionally safe?
Suppose there are only two subbnaks and both are free.

When first loop runs.

rsrc[FREE][0] = 1 - 1 = 0
rsrc[FREE][1] = 1 - 1 = 0

xa_erse check if any value is positive, in this case nothing is positive, nothing
will be removed. So rollback will be safe.
quoted
+
+	npc_unlock_all_subbank();
+	mutex_unlock(&mcam->lock);
+
+	return rc;
+}
Is the locking on the allocator side sufficient to make this setter
atomic with respect to MCAM allocation?
Yes. allocation is done in mbox handler; which is protected by mcam->lock.
Is the locking on the allocator side sufficient to make this setter
atomic with respect to MCAM allocation?
The setter takes mcam->lock plus all per-subbank locks and rewrites
xa_sb_used / xa_sb_free, modifies every sb->arr_idx, and toggles
restrict_valid.
The allocation hot path
rvu_mbox_handler_npc_mcam_alloc_entry()
 -> npc_mcam_alloc_entries()
   -> npc_cn20k_ref_idx_alloc()
     -> npc_subbank_noref_alloc()
iterates xa_for_each(&npc_priv.xa_sb_used, ...) and
xa_for_each(&npc_priv.xa_sb_free, ...) without mcam->lock held
(npc_mcam_alloc_entries() takes mcam->lock only after
npc_cn20k_ref_idx_alloc() returns) and without holding any per-subbank
lock between iterations (the per-subbank lock is taken inside
npc_subbank_alloc() / npc_subbank_suits() and released before the next
iteration step).
xa_for_each is RCU-safe so this is not a use-after-free, but a concurrent
SET can re-key xarray entries while the allocator is mid-walk, causing
the allocator to visit some subbanks twice and skip others entirely.
restrict_valid is also a plain bool read by npc_subbank_restrict_usage()
and npc_defrag_skip_restricted_sb() and a plain bool write by the setter,
with no READ_ONCE / WRITE_ONCE.
Should the allocator iteration also hold mcam->lock (or the relevant
subbank locks) for the duration of the walk, and should restrict_valid
use READ_ONCE / WRITE_ONCE?
mcam_allocation() is done under mutex_unlock(&mcam->lock); in mbox handler.
Mbox are handled serially by AF. So this case wont' happen.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help