Re: [PATCH v19 net-next 5/9] octeontx2-af: npc: cn20k: add subbank search order control
From: Ratheesh Kannoth <hidden>
Date: 2026-06-08 02:28:36
Also in:
lkml
On 2026-06-05 at 12:02:41, Ratheesh Kannoth (rkannoth@marvell.com) wrote:
CN20K NPC MCAM is split into 32 subbanks that are searched in a predefined order during allocation. Lower-numbered subbanks have higher priority than higher-numbered ones. Add a runtime "srch_order" to control the order in which subbanks are searched during MCAM allocation. Signed-off-by: Ratheesh Kannoth <redacted>
https://netdev-ai.bots.linux.dev/sashiko/#/patchset/20260605063245.3553861-1-rkannoth%40marvell.com says
quoted
+ USED = 1, + }; + + if (cnt != npc_priv.num_subbanks) { + dev_err(rvu->dev, "Number of entries(%u) != %u\n", + cnt, npc_priv.num_subbanks); + return -EINVAL; + } + + mutex_lock(&mcam->lock); + npc_lock_all_subbank();Can the encoding of subbank_srch_order[] be reconciled between initialization and this setter? The boot-time fill done by npc_create_srch_order() and npc_subbank_init() populates the array as subbank_srch_order[subbank_idx] = priority (i.e. indexed by subbank, value is priority). For cnt = 8 it produces [3, 5, 2, 6, 1, 7, 0, 4], whose actual hardware search order is sb6, sb4, sb2, sb0, sb7, sb1, sb3, sb5. The new setter writes the same array back as subbank_srch_order[priority] = subbank_idx (indexed by priority, value is subbank): for (int i = 0; i < cnt; i++) subbank_srch_order[i] = (u32)narr[i]; If a user runs devlink dev param show (which calls the GET path returning the boot-time encoding) and then re-applies the same array via devlink dev param set, the SET path treats it as a permutation of subbank indices and produces search order sb3, sb5, sb2, sb6, sb1, sb7, sb0, sb4 instead of the previous order. The validator also checks that arr[i] in 0..sz-1 occurs exactly once each, which matches the SET form but is also true of the boot-time content (subbank_srch_order[7] == 4 in the example), so the validator does not catch the round-trip mismatch. Should both writers agree on a single encoding so that show + set round-trips are stable?
Agree, but npc_create_srch_order() and npc_subbank_init() are not added by this function and is preexisting issue. WIll send patch to "net" tree soon after this series is merged.
+ xa_erase(&npc_priv.xa_sb_used, prio);quoted
+ } + + for (int i = 0; i < cnt; i++) + subbank_srch_order[i] = (u32)narr[i]; + + restrict_valid = false;Is the unconditional restrict_valid = false here intentional, and is there a way to undo it?
No. This is intentional.
The cleanup loop directly above does not only erase newly-added entries. The check rsrc[FREE][prio] == 1 (and the USED counterpart) also fires for slots whose successful iteration replaced a pre-existing entry whose paired decrementing iteration never ran due to the failure. For example, with sb 0 FREE arr_idx = 0 and sb 1 FREE arr_idx = 1 and narr = [1, 0]: - iter prio = 0 succeeds: xa_sb_free[0] is replaced from 0 to 1, rsrc[FREE][0] becomes +1, rsrc[FREE][1] becomes -1. - iter prio = 1 fails. - cleanup loop sees rsrc[FREE][0] == 1 and erases xa_sb_free[0]. The rollback loop then calls xa_store(xa_sb_free, save[0] = 0, sb 0) against an empty slot, which does need a fresh xarray node under GFP_KERNEL. If that allocation fails, the only response is WARN(!!err, ...) and the function returns the original error. After such a failure, sb->arr_idx is restored but the corresponding xa_sb_used / xa_sb_free entry is gone. Would future MCAM allocations via npc_subbank_noref_alloc() then silently skip those subbanks? Could xa_reserve() before the destructive phase, or not pre-erasing entries in the cleanup loop, make rollback unconditionally safe?
Suppose there are only two subbnaks and both are free. When first loop runs. rsrc[FREE][0] = 1 - 1 = 0 rsrc[FREE][1] = 1 - 1 = 0 xa_erse check if any value is positive, in this case nothing is positive, nothing will be removed. So rollback will be safe.
quoted
+ + npc_unlock_all_subbank(); + mutex_unlock(&mcam->lock); + + return rc; +}Is the locking on the allocator side sufficient to make this setter atomic with respect to MCAM allocation?
Yes. allocation is done in mbox handler; which is protected by mcam->lock.
Is the locking on the allocator side sufficient to make this setter
atomic with respect to MCAM allocation?
The setter takes mcam->lock plus all per-subbank locks and rewrites
xa_sb_used / xa_sb_free, modifies every sb->arr_idx, and toggles
restrict_valid.
The allocation hot path
rvu_mbox_handler_npc_mcam_alloc_entry()
-> npc_mcam_alloc_entries()
-> npc_cn20k_ref_idx_alloc()
-> npc_subbank_noref_alloc()
iterates xa_for_each(&npc_priv.xa_sb_used, ...) and
xa_for_each(&npc_priv.xa_sb_free, ...) without mcam->lock held
(npc_mcam_alloc_entries() takes mcam->lock only after
npc_cn20k_ref_idx_alloc() returns) and without holding any per-subbank
lock between iterations (the per-subbank lock is taken inside
npc_subbank_alloc() / npc_subbank_suits() and released before the next
iteration step).
xa_for_each is RCU-safe so this is not a use-after-free, but a concurrent
SET can re-key xarray entries while the allocator is mid-walk, causing
the allocator to visit some subbanks twice and skip others entirely.
restrict_valid is also a plain bool read by npc_subbank_restrict_usage()
and npc_defrag_skip_restricted_sb() and a plain bool write by the setter,
with no READ_ONCE / WRITE_ONCE.
Should the allocator iteration also hold mcam->lock (or the relevant
subbank locks) for the duration of the walk, and should restrict_valid
use READ_ONCE / WRITE_ONCE?mcam_allocation() is done under mutex_unlock(&mcam->lock); in mbox handler. Mbox are handled serially by AF. So this case wont' happen.