Thread (36 messages) 36 messages, 5 authors, 2024-10-28

Re: [PATCH v5 17/17] powerpc64/bpf: Add support for bpf trampolines

From: Michael Ellerman <mpe@ellerman.id.au>
Date: 2024-10-28 05:46:54
Also in: bpf, linux-kbuild, linux-trace-kernel, lkml

Hari Bathini [off-list ref] writes:
On 10/10/24 3:09 pm, Hari Bathini wrote:
quoted
On 10/10/24 5:48 am, Michael Ellerman wrote:
quoted
Alexei Starovoitov [off-list ref] writes:
quoted
On Tue, Oct 1, 2024 at 12:18 AM Hari Bathini [off-list ref] 
wrote:
quoted
On 30/09/24 6:25 pm, Alexei Starovoitov wrote:
quoted
On Sun, Sep 29, 2024 at 10:33 PM Hari Bathini 
[off-list ref] wrote:
quoted
On 17/09/24 1:20 pm, Alexei Starovoitov wrote:
quoted
On Sun, Sep 15, 2024 at 10:58 PM Hari Bathini 
[off-list ref] wrote:
quoted
+
+       /*
+        * Generated stack layout:
+        *
+        * func prev back chain         [ back chain        ]
+        *                              [                   ]
+        * bpf prog redzone/tailcallcnt [ ...               ] 64 
bytes (64-bit powerpc)
+        *                              [                   ] --
...
quoted
+
+       /* Dummy frame size for proper unwind - includes 64- 
bytes red zone for 64-bit powerpc */
+       bpf_dummy_frame_size = STACK_FRAME_MIN_SIZE + 64;
What is the goal of such a large "red zone" ?
The kernel stack is a limited resource.
Why reserve 64 bytes ?
tail call cnt can probably be optional as well.
Hi Alexei, thanks for reviewing.
FWIW, the redzone on ppc64 is 288 bytes. BPF JIT for ppc64 was using
a redzone of 80 bytes since tailcall support was introduced [1].
It came down to 64 bytes thanks to [2]. The red zone is being used
to save NVRs and tail call count when a stack is not setup. I do
agree that we should look at optimizing it further. Do you think
the optimization should go as part of PPC64 trampoline enablement
being done here or should that be taken up as a separate item, maybe?
The follow up is fine.
It just odd to me that we currently have:

[   unused red zone ] 208 bytes protected

I simply don't understand why we need to waste this much stack space.
Why can't it be zero today ?
The ABI for ppc64 has a redzone of 288 bytes below the current
stack pointer that can be used as a scratch area until a new
stack frame is created. So, no wastage of stack space as such.
It is just red zone that can be used before a new stack frame
is created. The comment there is only to show how redzone is
being used in ppc64 BPF JIT. I think the confusion is with the
mention of "208 bytes" as protected. As not all of that scratch
area is used, it mentions the remaining as unused. Essentially
288 bytes below current stack pointer is protected from debuggers
and interrupt code (red zone). Note that it should be 224 bytes
of unused red zone instead of 208 bytes as red zone usage in
ppc64 BPF JIT come down from 80 bytes to 64 bytes since [2].
Hope that clears the misunderstanding..
I see. That makes sense. So it's similar to amd64 red zone,
but there we have an issue with irqs, hence the kernel is
compiled with -mno-red-zone.
I assume that issue is that the interrupt entry unconditionally writes
some data below the stack pointer, disregarding the red zone?
quoted
I guess ppc always has a different interrupt stack and
it's not an issue?
No, the interrupt entry allocates a frame that is big enough to cover
the red zone as well as the space it needs to save registers.

See STACK_INT_FRAME_SIZE which includes KERNEL_REDZONE_SIZE:

   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/ 
tree/arch/powerpc/include/asm/ptrace.h? 
commit=8cf0b93919e13d1e8d4466eb4080a4c4d9d66d7b#n165

Which is renamed to INT_FRAME_SIZE in asm-offsets.c and then is used in
the interrupt entry here:

   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/ 
tree/arch/powerpc/kernel/exceptions-64s.S? 
commit=8cf0b93919e13d1e8d4466eb4080a4c4d9d66d7b#n497
Thanks for clarifying that, Michael.
Only async interrupt handlers use different interrupt stacks, right?
... and separate emergency stack for some special cases...
There isn't a neat rule like sync/async.

Most interrupts use the normal kernel stack, whether sync or async.

External interrupts switch to a separate hard interrupt stack
(hardirq_ctx) in call_do_irq(), but only after coming in on the kernel
stack first.

Some interrupts use the emergency stack (in some cases), eg. HMI, soft
NMI (fake), TM bad thing (program check), or their own stack, system
reset (nmi_emergency_sp), machine check (mc_emergency_sp).

cheers
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help