Re: [PATCH v19 net-next 7/9] octeontx2-af: npc: Support for custom KPU... | netdev

Re: [PATCH v19 net-next 7/9] octeontx2-af: npc: Support for custom KPU profile from filesystem

From: Ratheesh Kannoth <rkannoth@marvell.com>
Date: 2026-06-08 02:30:36
Also in: lkml

On 2026-06-05 at 12:02:43, Ratheesh Kannoth (rkannoth@marvell.com) wrote:

Flashing updated firmware on deployed devices is cumbersome. Provide a
mechanism to load a custom KPU (Key Parse Unit) profile directly from
the filesystem at module load time.

When the rvu_af module is loaded with the kpu_profile parameter, the
specified profile is read from /lib/firmware/kpu and programmed into
the KPU registers. Add npc_kpu_profile_cam2 for the extended cam format
used by filesystem-loaded profiles and support ptype/ptype_mask in
npc_config_kpucam when profile->from_fs is set.

Usage:
  1. Copy the KPU profile file to /lib/firmware/kpu.
  2. Build OCTEONTX2_AF as a module.
  3. Load: insmod rvu_af.ko kpu_profile=<profile_name>

Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>

https://netdev-ai.bots.linux.dev/sashiko/#/patchset/20260605063245.3553861-1-rkannoth%40marvell.com says

quoted

+	struct npc_kpu_profile_action *action2;
+	struct npc_kpu_profile_cam2 *cam2;
 };

The on-disk difference between npc_kpu_profile_cam and
npc_kpu_profile_cam2 is two extra trailing bytes (ptype, ptype_mask).
Could a single normalized in-memory representation work, with the FS
loader filling ptype/ptype_mask to zero when reading the legacy layout?
Every consumer now has to dispatch through accessor helpers
(npc_get_kpu_cam_nth_entry(), npc_get_num_kpu_cam_entries(),
npc_get_ikpu_nth_entry(), ...) gated on profile->from_fs, and any new
consumer that forgets the accessor reads the wrong table silently.
The names cam2 / action2 / ikpu2 also do not carry semantic meaning on
their own — would something tied to the format (for example
cam_with_ptype) read more clearly?
The from_fs flag also conflates two orthogonal concerns: where the bytes
came from, and which on-disk schema they use. Is it intentional that the
new extended format cannot be shipped through the FW-DB path, and that a
legacy-format profile cannot be loaded from /lib/firmware/kpu/?

This is a new feature request for customer, prior to this we did not have support
for filesystem loading

+		u8 ptype = kpucam2->ptype;

quoted

+		u8 pmask = kpucam2->ptype_mask;
+
+		*val |= FIELD_PREP(GENMASK_ULL(57, 56), ptype & pmask);
+		*mask |= FIELD_PREP(GENMASK_ULL(57, 56), ~ptype & pmask);
+	}

The encoding writes ptype into bits [57:56] of the 64-bit CAM word, but
struct npc_kpu_cam declares those bits as a reserved 8-bit field:
   struct npc_kpu_cam {
   #if defined(__BIG_ENDIAN_BITFIELD)
       u64 rsvd_63_56     : 8;
       u64 state          : 8;
       ...
Should the bitfield in npc_kpu_cam be extended with a named ptype member
so the encoding is visible in the struct layout (and checkable by sparse)?
ptype and ptype_mask in npc_kpu_profile_cam2 are u8 (8-bit), but only the
low 2 bits land in the CAM word via GENMASK_ULL(57, 56); any user passing
ptype with bits above bit 1 silently has those bits dropped. Is that
intentional, and if so could the field width be enforced (range-checked
on load) to surface bad firmware data?
There is also no #define for the bit positions or for the constant 0x03
that npc_parser_profile_init() pairs with NPC_AF_PKINDX_TYPE later in
this patch — a symbolic name would make the relationship between the
two writes clear.

This is customer specific and all these are intentional.

that npc_parser_profile_init() pairs with NPC_AF_PKINDX_TYPE later in
this patch — a symbolic name would make the relationship between the
two writes clear.
[ ... ]

quoted

+static int npc_apply_custom_kpu_from_fs(struct rvu *rvu,
+					struct npc_kpu_profile_adapter *profile)
+{
+	size_t hdr_sz = sizeof(struct npc_kpu_profile_fwdata), offset = 0;
+	const struct npc_kpu_profile_fwdata *fw;
+	struct npc_kpu_profile_action *action;
+	struct npc_kpu_profile_cam2 *cam2;
+	struct npc_kpu_fwdata *fw_kpu;
+	int entries, ret, entry, kpu;

 	fw = rvu->kpu_fwdata;

+	/* Binary blob contains ikpu actions entries at start of data[0] */
+	profile->ikpu2 = devm_kcalloc(rvu->dev, 1,
+				      sizeof(ikpu_action_entries),
+				      GFP_KERNEL);
+	if (!profile->ikpu2)
+		return -ENOMEM;
+
+	action = (struct npc_kpu_profile_action *)(fw->data + offset);
+
+	if (rvu->kpu_fwdata_sz < hdr_sz + sizeof(ikpu_action_entries))
+		return -EINVAL;

Does the order here leave profile->ikpu2 charged to the device with
uninitialized contents on the -EINVAL path?
The devm_kcalloc() runs before the size check, and on error
profile->from_fs is not set, so npc_get_ikpu_nth_entry() will not
dereference ikpu2 today — but any future caller that consults
profile->ikpu2 directly would see uninitialized memory at the right size.
Could the validation move ahead of the allocation, with memcpy following
both?

quoted

+	/* The firmware layout does dependent on the internal size of
+	 * ikpu_action_entries.
+	 */
+	memcpy((void *)profile->ikpu2, action, sizeof(ikpu_action_entries));
+	offset += sizeof(ikpu_action_entries);

The fwdata header declares the KPU count via fw->kpus but does not
declare an ikpu/pkind count for this leading region — its size is
implicitly hardcoded to ARRAY_SIZE(ikpu_action_entries), an in-tree
kernel array. The comment acknowledges this hidden ABI dependency.
If ikpu_action_entries ever grows or shrinks in-tree, every previously
distributed FS blob silently misaligns: the kernel reads the wrong
number of bytes for ikpu actions, and all subsequent KPU CAM/action
offsets are off. The size check
"if (rvu->kpu_fwdata_sz < hdr_sz + sizeof(ikpu_action_entries))" only
ensures there are enough bytes — it does not ensure the bytes actually
represent ikpu actions of that exact count.
Could the on-disk format carry an explicit length / pkind count for the
ikpu region so the kernel can validate and reject mismatched blobs?

if  npc_apply_custom_kpu_from_fs() fails with above mentioned -EINVAL,
npc_prepare_default_kpu(rvu, profile) again intialize profile->ikpu2 to
static memory in npc_load_kpu_profile() function.

profile->ikpu2 directly would see uninitialized memory at the right size.
Could the validation move ahead of the allocation, with memcpy following
both?

quoted

+	/* The firmware layout does dependent on the internal size of
+	 * ikpu_action_entries.
+	 */
+	memcpy((void *)profile->ikpu2, action, sizeof(ikpu_action_entries));
+	offset += sizeof(ikpu_action_entries);

The fwdata header declares the KPU count via fw->kpus but does not
declare an ikpu/pkind count for this leading region — its size is
implicitly hardcoded to ARRAY_SIZE(ikpu_action_entries), an in-tree
kernel array. The comment acknowledges this hidden ABI dependency.
If ikpu_action_entries ever grows or shrinks in-tree, every previously
distributed FS blob silently misaligns: the kernel reads the wrong
number of bytes for ikpu actions, and all subsequent KPU CAM/action
offsets are off. The size check
"if (rvu->kpu_fwdata_sz < hdr_sz + sizeof(ikpu_action_entries))" only
ensures there are enough bytes — it does not ensure the bytes actually
represent ikpu actions of that exact count.
Could the on-disk format carry an explicit length / pkind count for the
ikpu region so the kernel can validate and reject mismatched blobs?

There is no field (currently) in structure to indicate number of ikpu_action_entries
in binary blob. So we keep the number same in all binary blob. We can't add the size
as it will break backward compatability.

+	for (kpu = 0; kpu < fw->kpus; kpu++) {
[ ... ]

quoted

+		fw_kpu = (struct npc_kpu_fwdata *)(fw->data + offset);
+		if (fw_kpu->entries <= 0) {
+			dev_warn(rvu->dev,
+				 "Invalid kpu entries on KPU%d\n", kpu);
+			return -EINVAL;
+		}
+
+		entries = min_t(int, fw_kpu->entries, rvu->hw->npc_kpu_entries);

[ ... ]

quoted

+		cam2 = (struct npc_kpu_profile_cam2 *)fw_kpu->data;
+		offset += sizeof(*fw_kpu) + fw_kpu->entries * sizeof(*cam2);
+		action = (struct npc_kpu_profile_action *)(fw->data + offset);
+		offset += fw_kpu->entries * sizeof(*action);
+		if (rvu->kpu_fwdata_sz < hdr_sz + offset) {

Could fw_kpu->entries get an upper bound check before being used in the
offset arithmetic? The FW-DB sibling npc_apply_custom_kpu_from_fw()
clamps via min_t(int, fw_kpu->entries, KPU_MAX_CST_ENT) before doing the
arithmetic, while this path only clamps after. On a 64-bit size_t the
post-arithmetic check correctly catches malformed blobs, but defensive
symmetry with the FW-DB path would be nice.

OK. will add to net-next tree after this series is merged.

[ ... ]

quoted

+	fw = rvu->kpu_fwdata;
 	if (le64_to_cpu(fw->signature) != KPU_SIGN) {
 		dev_warn(rvu->dev, "Invalid KPU profile signature %llx\n",
 			 fw->signature);

Two distinct on-disk binary formats now share the same KPU_SIGN
("kpuprof\0") signature and the same npc_kpu_profile_fwdata header.
The FW-DB layout expects fwdata->data[0] to start directly with KPU
entries; the new FS layout expects data[0] to start with
sizeof(ikpu_action_entries) bytes of ikpu actions, then KPU entries
encoded as npc_kpu_profile_cam2 (with ptype/ptype_mask) instead of
npc_kpu_profile_cam.
Nothing in npc_apply_custom_kpu() distinguishes the two; the choice is
made by which load path is invoked (the from_fs argument). Is that a
deliberate decision?

Yes. from_fs binary blob has npc_kpu_profile_cam2 format.

If a user accidentally drops a FW-DB-format blob into /lib/firmware/kpu,
the signature passes, npc_apply_custom_kpu_from_fs() memcpys the first
sizeof(ikpu_action_entries) bytes (which are KPU CAM/action data in that
layout) into profile->ikpu2, and npc_parser_profile_init() then programs
those bytes into NPC pkind action registers via
npc_config_kpuaction(rvu, blkaddr, npc_get_ikpu_nth_entry(rvu, idx),
0, idx, true). The remaining KPU entries are read with the wrong layout
(npc_kpu_profile_cam2 vs npc_kpu_profile_cam) on top.
Could the blob carry a layout magic or version field independent of
load source, and the loader reject mismatches?

User has to load correct FW to get requested feature. We will suggest customer
your suggestion to have different signatures.

+	*fw_kpus = fw->kpus;
The cn20k early return at the top of this function returns without
writing *fw_kpus. Should that path also set *fw_kpus, or should the
parameter be removed from cn20k call sites?
Today npc_load_kpu_profile_from_fw() declares int fw_kpus = 0 and never
reads it back, so the dead local hides the asymmetry. If cn20k is ever
wired into the FS path the trailing
"rvu->kpu.kpus = fw_kpus; profile->kpus = fw_kpus;" in
npc_load_kpu_profile_from_fs() will silently zero kpus and disable KPU
programming because *fw_kpus was never written.

Currently only cn10k has support for custom binary support from filesystem.

+	int ret, fw_kpus = 0;

quoted

+	char path[512] = "kpu/";
+
+	if (strlen(kpu_profile) > sizeof(path) - strlen("kpu/") - 1) {
+		dev_err(rvu->dev, "kpu profile name is too big\n");
+		return -ENOSPC;
+	}
+
+	strcat(path, kpu_profile);

rvu->kpu_pfl_name is char[KPU_NAME_LEN] (32 bytes), so strlen() of that
buffer can never exceed sizeof(path) - 5 = 507. Is the strlen() check
intended as defense-in-depth, or is a smaller path[KPU_NAME_LEN +
sizeof("kpu/")] buffer (or snprintf(path, sizeof(path), "kpu/%.32s",
kpu_profile)) preferable so the bound and intent line up?

We will add it in defensive pathset to net-next following this patchseries
is merged.

quoted

+	rvu->kpu.kpus = fw_kpus;
+	profile->kpus = fw_kpus;
+	profile->from_fs = true;
+	return 0;
+}

The success path leaves rvu->kpu_fwdata_sz non-zero while
rvu->kpu_fwdata has been set to NULL. The error path correctly zeroes
both.
npc_load_mkex_profile() and npc_cn20k_load_mkex_profile() consult
kpu_fwdata_sz to decide whether to skip the FW-DB mkex lookup, so the
asymmetry is harmless today, but any future consumer that uses
kpu_fwdata_sz to gate a kpu_fwdata dereference will hit a NULL deref.
Could kpu_fwdata_sz be cleared symmetrically on success?

This is intentional. We dont need to keep this symmetry.

quoted

+void npc_load_kpu_profile(struct rvu *rvu)
+{

[ ... ]

quoted

+	/* Filesystem-based KPU loading is not supported on cn20k.
+	 * npc_prepare_default_kpu() was invoked earlier, but control
+	 * reached this point because the default profile was not selected.
+	 * No need to call it again.
+	 */
+	if (!is_cn20k(rvu->pdev)) {
+		if (!npc_load_kpu_profile_from_fs(rvu))
+			return;
+	}
+
+	/* First prepare default KPU, then we'll customize top entries. */
+	npc_prepare_default_kpu(rvu, profile);
+	if (!npc_load_kpu_profile_from_fw(rvu))
+		return;

 	npc_prepare_default_kpu(rvu, profile);
 }

The comment says "No need to call it again" but the very next executable
line after the if (!is_cn20k(...)) block is exactly
npc_prepare_default_kpu(rvu, profile). The second call is in fact needed
on non-cn20k to reset state corrupted by a partial FS attempt before the
FW path runs. Could the comment be reworded to reflect that?

This indicate that npc_prepare_default_kpu() called above, so we dont have to callit before
calling npc_load_kpu_profile_from_fs() in cn10k.

[ ... ]

quoted

 	for (idx = 0; idx < num_kpus; idx++)
 		npc_program_kpu_profile(rvu, blkaddr, idx, &rvu->kpu.kpu[idx]);
+
+	if (profile->from_fs) {
+		rvu_write64(rvu, blkaddr, NPC_AF_PKINDX_TYPE(54), 0x03);
+		rvu_write64(rvu, blkaddr, NPC_AF_PKINDX_TYPE(58), 0x03);
+	}
 }

A few questions about these two writes:
The pkind indices 54 and 58 match npc.h enum values
NPC_RX_CPT_HDR_PTP_PKIND = 54 and NPC_RX_CPT_HDR_PKIND = 58 — could
those named constants be used here so the relationship is explicit?
The constant 0x03 has no symbolic name in rvu_reg.h, and there is no
#define for the bits in NPC_AF_PKINDX_TYPE. Could a named macro be added
that documents what value is being programmed?
The same function carefully bounds pkind iteration earlier with
"num_pkinds = min_t(int, hw->npc_pkinds, num_pkinds);" but these two
writes target indices 54 and 58 unconditionally. hw->npc_pkinds comes
from NPC_AF_CONST1[19:12] and is silicon-dependent — should these be
gated on hw->npc_pkinds >= 59 (or on the corresponding pkind being
defined) to avoid writing undefined CSRs on a future variant?
Finally, is it intentional that this hardware register configuration is
chosen based on the load mechanism (from_fs) rather than profile
content? Any future FS profile that does not expect pkinds 54 and 58 to
be configured this way would silently misbehave. Could the writes be
driven from the profile data itself?

This is very customer specific and intentional. I did not find any speific
names for these values as there are many values in npc_profie.h as well.

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help