[PATCH v6 2/6] powerpc/bpf: Move out dummy_tramp_addr after Long branch stub
From: <hidden>
Date: 2026-05-28 21:59:17
Also in:
bpf, linux-kselftest, stable
Subsystem:
bpf jit for powerpc (32-bit and 64-bit), bpf [general] (safe dynamic programs and tools), linux for powerpc (32-bit and 64-bit), the rest · Maintainers:
Hari Bathini, Christophe Leroy, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Eduard Zingerman, Kumar Kartikeya Dwivedi, Madhavan Srinivasan, Michael Ellerman, Linus Torvalds
From: Abhishek Dubey <redacted>
Move the long branch address field to the bottom of the long
branch stub. This allows uninterrupted disassembly until the
last 8 bytes. The last bytes exclusion is logically necessary to
prevent disassembly failure, otherwise the actual program layout
is never altered. Hence no effect on overall program size.
Also, align dummy_tramp_addr field with 8-byte boundary.
Following is disassembler output for test program with moved down
dummy_tramp_addr field:
.....
.....
pc:68 left:44 a6 03 08 7c : mtlr 0
pc:72 left:40 bc ff ff 4b : b .-68
pc:76 left:36 a6 02 68 7d : mflr 11
pc:80 left:32 05 00 9f 42 : bcl 20, 31, .+4
pc:84 left:28 a6 02 88 7d : mflr 12
pc:88 left:24 14 00 8c e9 : ld 12, 20(12)
pc:92 left:20 a6 03 89 7d : mtctr 12
pc:96 left:16 a6 03 68 7d : mtlr 11
pc:100 left:12 20 04 80 4e : bctr
pc:104 left:8 c0 34 1d 00 :
Failure log:
Can't disasm instruction at offset 104: c0 34 1d 00 00 00 00 c0
Disassembly logic can truncate at 104, ignoring last 8 bytes.
Update the dummy_tramp_addr field offset calculation from the end
of the program to reflect its new location, for bpf_arch_text_poke()
to update the actual trampoline's address in this field.
All BPF trampoline selftests continue to pass with this patch applied.
Reported-by: bot+bpf-ci@kernel.org
Fixes: d243b62b7bd3 ("powerpc64/bpf: Add support for bpf trampolines")
Cc: stable@vger.kernel.org
Signed-off-by: Abhishek Dubey <redacted>
---
arch/powerpc/net/bpf_jit_comp.c | 46 +++++++++++++++++----------------
1 file changed, 24 insertions(+), 22 deletions(-)
diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index 3492d82d147f..9885a68f64f4 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c@@ -52,9 +52,10 @@ asm ( void bpf_jit_build_fentry_stubs(u32 *image, u32 *fimage, struct codegen_context *ctx) { int ool_stub_idx, long_branch_stub_idx; - int ool_instrs; + int stubs_instrs; /* + * The dummy_tramp_addr field is placed at bottom of Long branch stub. * In the final pass, align the mis-aligned dummy_tramp_addr field * in the fimage. The alignment NOP must appear before OOL stub, * to make ool_stub_idx & long_branch_stub_idx constant from end.
@@ -62,13 +63,10 @@ void bpf_jit_build_fentry_stubs(u32 *image, u32 *fimage, struct codegen_context * dummy_tramp_addr must be 8-byte aligned for load-register * compatibility. The fimage can be non 8-byte aligned, so final * alignment depends on start of fimage and the stub's instruction - * count offset. The OOL stub has 4 instructions (with - * CONFIG_PPC_FTRACE_OUT_OF_LINE) or 3 instructions (without) - * before dummy_tramp_addr. - * - * Emit a NOP here if (ctx->idx + ool_instrs) is odd, so that - * dummy_tramp_addr lands at an even instruction offset (== 8-byte - * aligned from an 8-byte aligned base). + * count. The stubs block has 11 instructions (with + * CONFIG_PPC_FTRACE_OUT_OF_LINE) or 10 instructions (without) + * before dummy_tramp_addr field. Emit a NOP if the address of + * dummy_tramp_addr is non aligned. * * In pass=0 when image==NULL, conservatively account for space * required to accommodate alignment NOP. In case final pass skips
@@ -76,8 +74,8 @@ void bpf_jit_build_fentry_stubs(u32 *image, u32 *fimage, struct codegen_context * jited_len signifies correct program size. */ - ool_instrs = IS_ENABLED(CONFIG_PPC_FTRACE_OUT_OF_LINE) ? 4*4 : 3*4; - if (!image || !IS_ALIGNED((unsigned long)fimage + ctx->idx*4 + ool_instrs, 8)) + stubs_instrs = IS_ENABLED(CONFIG_PPC_FTRACE_OUT_OF_LINE) ? 11*4 : 10*4; + if (!image || !IS_ALIGNED((unsigned long)fimage + ctx->idx*4 + stubs_instrs, 8)) EMIT(PPC_RAW_NOP()); /*
@@ -98,28 +96,29 @@ void bpf_jit_build_fentry_stubs(u32 *image, u32 *fimage, struct codegen_context /* * Long branch stub: - * .long <dummy_tramp_addr> // 8-byte aligned * mflr r11 * bcl 20,31,$+4 - * mflr r12 - * ld r12, -8-SZL(r12) + * mflr r12 // lr/r12 stores pc of current(this) inst. + * ld r12, 20(r12) // offset(dummy_tramp_addr) from prev inst. is 20 * mtctr r12 - * mtlr r11 // needed to retain ftrace ABI + * mtlr r11 // needed to retain ftrace ABI * bctr + * .long <dummy_tramp_addr> // 8-byte aligned */ - if (image) - *((unsigned long *)&image[ctx->idx]) = (unsigned long)dummy_tramp; - - ctx->idx += SZL / 4; long_branch_stub_idx = ctx->idx; EMIT(PPC_RAW_MFLR(_R11)); EMIT(PPC_RAW_BCL4()); EMIT(PPC_RAW_MFLR(_R12)); - EMIT(PPC_RAW_LL(_R12, _R12, -8-SZL)); + EMIT(PPC_RAW_LL(_R12, _R12, 20)); EMIT(PPC_RAW_MTCTR(_R12)); EMIT(PPC_RAW_MTLR(_R11)); EMIT(PPC_RAW_BCTR()); + if (image) + *((unsigned long *)&image[ctx->idx]) = (unsigned long)dummy_tramp; + + ctx->idx += SZL / 4; + if (!bpf_jit_ool_stub) { bpf_jit_ool_stub = (ctx->idx - ool_stub_idx) * 4; bpf_jit_long_branch_stub = (ctx->idx - long_branch_stub_idx) * 4;
@@ -1289,6 +1288,7 @@ static void do_isync(void *info __maybe_unused) * bpf_func: * [nop|b] ool_stub * 2. Out-of-line stub: + * nop // optional nop for alignment * ool_stub: * mflr r0 * [b|bl] <bpf_prog>/<long_branch_stub>
@@ -1296,14 +1296,14 @@ static void do_isync(void *info __maybe_unused) * b bpf_func + 4 * 3. Long branch stub: * long_branch_stub: - * .long <branch_addr>/<dummy_tramp> * mflr r11 * bcl 20,31,$+4 * mflr r12 - * ld r12, -16(r12) + * ld r12, 20(r12) * mtctr r12 * mtlr r11 // needed to retain ftrace ABI * bctr + * .long <branch_addr>/<dummy_tramp> * * dummy_tramp is used to reduce synchronization requirements. *
@@ -1405,10 +1405,12 @@ int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type old_t, * 1. Update the address in the long branch stub: * If new_addr is out of range, we will have to use the long branch stub, so patch new_addr * here. Otherwise, revert to dummy_tramp, but only if we had patched old_addr here. + * + * dummy_tramp_addr moved to bottom of long branch stub. */ if ((new_addr && !is_offset_in_branch_range(new_addr - ip)) || (old_addr && !is_offset_in_branch_range(old_addr - ip))) - ret = patch_ulong((void *)(bpf_func_end - bpf_jit_long_branch_stub - SZL), + ret = patch_ulong((void *)(bpf_func_end - SZL), /* SZL: dummy_tramp_addr offset */ (new_addr && !is_offset_in_branch_range(new_addr - ip)) ? (unsigned long)new_addr : (unsigned long)dummy_tramp); if (ret)
--
2.52.0