Re: [PATCH v19 net-next 7/9] octeontx2-af: npc: Support for custom KPU profile from filesystem
From: Ratheesh Kannoth <rkannoth@marvell.com>
Date: 2026-06-08 02:30:36
Also in:
lkml
On 2026-06-05 at 12:02:43, Ratheesh Kannoth (rkannoth@marvell.com) wrote:
Flashing updated firmware on deployed devices is cumbersome. Provide a mechanism to load a custom KPU (Key Parse Unit) profile directly from the filesystem at module load time. When the rvu_af module is loaded with the kpu_profile parameter, the specified profile is read from /lib/firmware/kpu and programmed into the KPU registers. Add npc_kpu_profile_cam2 for the extended cam format used by filesystem-loaded profiles and support ptype/ptype_mask in npc_config_kpucam when profile->from_fs is set. Usage: 1. Copy the KPU profile file to /lib/firmware/kpu. 2. Build OCTEONTX2_AF as a module. 3. Load: insmod rvu_af.ko kpu_profile=<profile_name> Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
https://netdev-ai.bots.linux.dev/sashiko/#/patchset/20260605063245.3553861-1-rkannoth%40marvell.com says
quoted
+ struct npc_kpu_profile_action *action2; + struct npc_kpu_profile_cam2 *cam2; };The on-disk difference between npc_kpu_profile_cam and npc_kpu_profile_cam2 is two extra trailing bytes (ptype, ptype_mask). Could a single normalized in-memory representation work, with the FS loader filling ptype/ptype_mask to zero when reading the legacy layout? Every consumer now has to dispatch through accessor helpers (npc_get_kpu_cam_nth_entry(), npc_get_num_kpu_cam_entries(), npc_get_ikpu_nth_entry(), ...) gated on profile->from_fs, and any new consumer that forgets the accessor reads the wrong table silently. The names cam2 / action2 / ikpu2 also do not carry semantic meaning on their own — would something tied to the format (for example cam_with_ptype) read more clearly? The from_fs flag also conflates two orthogonal concerns: where the bytes came from, and which on-disk schema they use. Is it intentional that the new extended format cannot be shipped through the FW-DB path, and that a legacy-format profile cannot be loaded from /lib/firmware/kpu/?
This is a new feature request for customer, prior to this we did not have support for filesystem loading
+ u8 ptype = kpucam2->ptype;quoted
+ u8 pmask = kpucam2->ptype_mask; + + *val |= FIELD_PREP(GENMASK_ULL(57, 56), ptype & pmask); + *mask |= FIELD_PREP(GENMASK_ULL(57, 56), ~ptype & pmask); + }The encoding writes ptype into bits [57:56] of the 64-bit CAM word, but struct npc_kpu_cam declares those bits as a reserved 8-bit field: struct npc_kpu_cam { #if defined(__BIG_ENDIAN_BITFIELD) u64 rsvd_63_56 : 8; u64 state : 8; ... Should the bitfield in npc_kpu_cam be extended with a named ptype member so the encoding is visible in the struct layout (and checkable by sparse)? ptype and ptype_mask in npc_kpu_profile_cam2 are u8 (8-bit), but only the low 2 bits land in the CAM word via GENMASK_ULL(57, 56); any user passing ptype with bits above bit 1 silently has those bits dropped. Is that intentional, and if so could the field width be enforced (range-checked on load) to surface bad firmware data? There is also no #define for the bit positions or for the constant 0x03 that npc_parser_profile_init() pairs with NPC_AF_PKINDX_TYPE later in this patch — a symbolic name would make the relationship between the two writes clear.
This is customer specific and all these are intentional.
that npc_parser_profile_init() pairs with NPC_AF_PKINDX_TYPE later in this patch — a symbolic name would make the relationship between the two writes clear. [ ... ]quoted
+static int npc_apply_custom_kpu_from_fs(struct rvu *rvu, + struct npc_kpu_profile_adapter *profile) +{ + size_t hdr_sz = sizeof(struct npc_kpu_profile_fwdata), offset = 0; + const struct npc_kpu_profile_fwdata *fw; + struct npc_kpu_profile_action *action; + struct npc_kpu_profile_cam2 *cam2; + struct npc_kpu_fwdata *fw_kpu; + int entries, ret, entry, kpu; fw = rvu->kpu_fwdata; + /* Binary blob contains ikpu actions entries at start of data[0] */ + profile->ikpu2 = devm_kcalloc(rvu->dev, 1, + sizeof(ikpu_action_entries), + GFP_KERNEL); + if (!profile->ikpu2) + return -ENOMEM; + + action = (struct npc_kpu_profile_action *)(fw->data + offset); + + if (rvu->kpu_fwdata_sz < hdr_sz + sizeof(ikpu_action_entries)) + return -EINVAL;Does the order here leave profile->ikpu2 charged to the device with uninitialized contents on the -EINVAL path? The devm_kcalloc() runs before the size check, and on error profile->from_fs is not set, so npc_get_ikpu_nth_entry() will not dereference ikpu2 today — but any future caller that consults profile->ikpu2 directly would see uninitialized memory at the right size. Could the validation move ahead of the allocation, with memcpy following both?
quoted
+ /* The firmware layout does dependent on the internal size of + * ikpu_action_entries. + */ + memcpy((void *)profile->ikpu2, action, sizeof(ikpu_action_entries)); + offset += sizeof(ikpu_action_entries);The fwdata header declares the KPU count via fw->kpus but does not declare an ikpu/pkind count for this leading region — its size is implicitly hardcoded to ARRAY_SIZE(ikpu_action_entries), an in-tree kernel array. The comment acknowledges this hidden ABI dependency. If ikpu_action_entries ever grows or shrinks in-tree, every previously distributed FS blob silently misaligns: the kernel reads the wrong number of bytes for ikpu actions, and all subsequent KPU CAM/action offsets are off. The size check "if (rvu->kpu_fwdata_sz < hdr_sz + sizeof(ikpu_action_entries))" only ensures there are enough bytes — it does not ensure the bytes actually represent ikpu actions of that exact count. Could the on-disk format carry an explicit length / pkind count for the ikpu region so the kernel can validate and reject mismatched blobs?
if npc_apply_custom_kpu_from_fs() fails with above mentioned -EINVAL, npc_prepare_default_kpu(rvu, profile) again intialize profile->ikpu2 to static memory in npc_load_kpu_profile() function.
profile->ikpu2 directly would see uninitialized memory at the right size. Could the validation move ahead of the allocation, with memcpy following both?quoted
+ /* The firmware layout does dependent on the internal size of + * ikpu_action_entries. + */ + memcpy((void *)profile->ikpu2, action, sizeof(ikpu_action_entries)); + offset += sizeof(ikpu_action_entries);The fwdata header declares the KPU count via fw->kpus but does not declare an ikpu/pkind count for this leading region — its size is implicitly hardcoded to ARRAY_SIZE(ikpu_action_entries), an in-tree kernel array. The comment acknowledges this hidden ABI dependency. If ikpu_action_entries ever grows or shrinks in-tree, every previously distributed FS blob silently misaligns: the kernel reads the wrong number of bytes for ikpu actions, and all subsequent KPU CAM/action offsets are off. The size check "if (rvu->kpu_fwdata_sz < hdr_sz + sizeof(ikpu_action_entries))" only ensures there are enough bytes — it does not ensure the bytes actually represent ikpu actions of that exact count. Could the on-disk format carry an explicit length / pkind count for the ikpu region so the kernel can validate and reject mismatched blobs?
There is no field (currently) in structure to indicate number of ikpu_action_entries in binary blob. So we keep the number same in all binary blob. We can't add the size as it will break backward compatability.
+ for (kpu = 0; kpu < fw->kpus; kpu++) { [ ... ]quoted
+ fw_kpu = (struct npc_kpu_fwdata *)(fw->data + offset); + if (fw_kpu->entries <= 0) { + dev_warn(rvu->dev, + "Invalid kpu entries on KPU%d\n", kpu); + return -EINVAL; + } + + entries = min_t(int, fw_kpu->entries, rvu->hw->npc_kpu_entries);[ ... ]quoted
+ cam2 = (struct npc_kpu_profile_cam2 *)fw_kpu->data; + offset += sizeof(*fw_kpu) + fw_kpu->entries * sizeof(*cam2); + action = (struct npc_kpu_profile_action *)(fw->data + offset); + offset += fw_kpu->entries * sizeof(*action); + if (rvu->kpu_fwdata_sz < hdr_sz + offset) {Could fw_kpu->entries get an upper bound check before being used in the offset arithmetic? The FW-DB sibling npc_apply_custom_kpu_from_fw() clamps via min_t(int, fw_kpu->entries, KPU_MAX_CST_ENT) before doing the arithmetic, while this path only clamps after. On a 64-bit size_t the post-arithmetic check correctly catches malformed blobs, but defensive symmetry with the FW-DB path would be nice.
OK. will add to net-next tree after this series is merged.
[ ... ]quoted
+ fw = rvu->kpu_fwdata; if (le64_to_cpu(fw->signature) != KPU_SIGN) { dev_warn(rvu->dev, "Invalid KPU profile signature %llx\n", fw->signature);Two distinct on-disk binary formats now share the same KPU_SIGN ("kpuprof\0") signature and the same npc_kpu_profile_fwdata header. The FW-DB layout expects fwdata->data[0] to start directly with KPU entries; the new FS layout expects data[0] to start with sizeof(ikpu_action_entries) bytes of ikpu actions, then KPU entries encoded as npc_kpu_profile_cam2 (with ptype/ptype_mask) instead of npc_kpu_profile_cam. Nothing in npc_apply_custom_kpu() distinguishes the two; the choice is made by which load path is invoked (the from_fs argument). Is that a deliberate decision?
Yes. from_fs binary blob has npc_kpu_profile_cam2 format.
If a user accidentally drops a FW-DB-format blob into /lib/firmware/kpu, the signature passes, npc_apply_custom_kpu_from_fs() memcpys the first sizeof(ikpu_action_entries) bytes (which are KPU CAM/action data in that layout) into profile->ikpu2, and npc_parser_profile_init() then programs those bytes into NPC pkind action registers via npc_config_kpuaction(rvu, blkaddr, npc_get_ikpu_nth_entry(rvu, idx), 0, idx, true). The remaining KPU entries are read with the wrong layout (npc_kpu_profile_cam2 vs npc_kpu_profile_cam) on top. Could the blob carry a layout magic or version field independent of load source, and the loader reject mismatches?
User has to load correct FW to get requested feature. We will suggest customer your suggestion to have different signatures.
+ *fw_kpus = fw->kpus; The cn20k early return at the top of this function returns without writing *fw_kpus. Should that path also set *fw_kpus, or should the parameter be removed from cn20k call sites? Today npc_load_kpu_profile_from_fw() declares int fw_kpus = 0 and never reads it back, so the dead local hides the asymmetry. If cn20k is ever wired into the FS path the trailing "rvu->kpu.kpus = fw_kpus; profile->kpus = fw_kpus;" in npc_load_kpu_profile_from_fs() will silently zero kpus and disable KPU programming because *fw_kpus was never written.
Currently only cn10k has support for custom binary support from filesystem.
+ int ret, fw_kpus = 0;quoted
+ char path[512] = "kpu/"; + + if (strlen(kpu_profile) > sizeof(path) - strlen("kpu/") - 1) { + dev_err(rvu->dev, "kpu profile name is too big\n"); + return -ENOSPC; + } + + strcat(path, kpu_profile);rvu->kpu_pfl_name is char[KPU_NAME_LEN] (32 bytes), so strlen() of that buffer can never exceed sizeof(path) - 5 = 507. Is the strlen() check intended as defense-in-depth, or is a smaller path[KPU_NAME_LEN + sizeof("kpu/")] buffer (or snprintf(path, sizeof(path), "kpu/%.32s", kpu_profile)) preferable so the bound and intent line up?
We will add it in defensive pathset to net-next following this patchseries is merged.
quoted
+ rvu->kpu.kpus = fw_kpus; + profile->kpus = fw_kpus; + profile->from_fs = true; + return 0; +}The success path leaves rvu->kpu_fwdata_sz non-zero while rvu->kpu_fwdata has been set to NULL. The error path correctly zeroes both. npc_load_mkex_profile() and npc_cn20k_load_mkex_profile() consult kpu_fwdata_sz to decide whether to skip the FW-DB mkex lookup, so the asymmetry is harmless today, but any future consumer that uses kpu_fwdata_sz to gate a kpu_fwdata dereference will hit a NULL deref. Could kpu_fwdata_sz be cleared symmetrically on success?
This is intentional. We dont need to keep this symmetry.
quoted
+void npc_load_kpu_profile(struct rvu *rvu) +{[ ... ]quoted
+ /* Filesystem-based KPU loading is not supported on cn20k. + * npc_prepare_default_kpu() was invoked earlier, but control + * reached this point because the default profile was not selected. + * No need to call it again. + */ + if (!is_cn20k(rvu->pdev)) { + if (!npc_load_kpu_profile_from_fs(rvu)) + return; + } + + /* First prepare default KPU, then we'll customize top entries. */ + npc_prepare_default_kpu(rvu, profile); + if (!npc_load_kpu_profile_from_fw(rvu)) + return; npc_prepare_default_kpu(rvu, profile); }The comment says "No need to call it again" but the very next executable line after the if (!is_cn20k(...)) block is exactly npc_prepare_default_kpu(rvu, profile). The second call is in fact needed on non-cn20k to reset state corrupted by a partial FS attempt before the FW path runs. Could the comment be reworded to reflect that?
This indicate that npc_prepare_default_kpu() called above, so we dont have to callit before calling npc_load_kpu_profile_from_fs() in cn10k.
[ ... ]quoted
for (idx = 0; idx < num_kpus; idx++) npc_program_kpu_profile(rvu, blkaddr, idx, &rvu->kpu.kpu[idx]); + + if (profile->from_fs) { + rvu_write64(rvu, blkaddr, NPC_AF_PKINDX_TYPE(54), 0x03); + rvu_write64(rvu, blkaddr, NPC_AF_PKINDX_TYPE(58), 0x03); + } }A few questions about these two writes: The pkind indices 54 and 58 match npc.h enum values NPC_RX_CPT_HDR_PTP_PKIND = 54 and NPC_RX_CPT_HDR_PKIND = 58 — could those named constants be used here so the relationship is explicit? The constant 0x03 has no symbolic name in rvu_reg.h, and there is no #define for the bits in NPC_AF_PKINDX_TYPE. Could a named macro be added that documents what value is being programmed? The same function carefully bounds pkind iteration earlier with "num_pkinds = min_t(int, hw->npc_pkinds, num_pkinds);" but these two writes target indices 54 and 58 unconditionally. hw->npc_pkinds comes from NPC_AF_CONST1[19:12] and is silicon-dependent — should these be gated on hw->npc_pkinds >= 59 (or on the corresponding pkind being defined) to avoid writing undefined CSRs on a future variant? Finally, is it intentional that this hardware register configuration is chosen based on the load mechanism (from_fs) rather than profile content? Any future FS profile that does not expect pkinds 54 and 58 to be configured this way would silently misbehave. Could the writes be driven from the profile data itself?
This is very customer specific and intentional. I did not find any speific names for these values as there are many values in npc_profie.h as well.