Thread (30 messages) 30 messages, 10 authors, 2022-11-10

Re: [PATCH bpf-next v2 0/4] Add ftrace direct call for arm64

From: Xu Kuohai <hidden>
Date: 2022-09-27 04:50:10
Also in: bpf, lkml

On 9/27/2022 1:43 AM, Mark Rutland wrote:
On Mon, Sep 26, 2022 at 03:40:20PM +0100, Catalin Marinas wrote:
quoted
On Thu, Sep 22, 2022 at 08:01:16PM +0200, Daniel Borkmann wrote:
quoted
On 9/13/22 6:27 PM, Xu Kuohai wrote:
quoted
This series adds ftrace direct call for arm64, which is required to attach
bpf trampoline to fentry.

Although there is no agreement on how to support ftrace direct call on arm64,
no patch has been posted except the one I posted in [1], so this series
continues the work of [1] with the addition of long jump support. Now ftrace
direct call works regardless of the distance between the callsite and custom
trampoline.

[1] https://lore.kernel.org/bpf/20220518131638.3401509-2-xukuohai@huawei.com/ (local)

v2:
- Fix compile and runtime errors caused by ftrace_rec_arch_init

v1: https://lore.kernel.org/bpf/20220913063146.74750-1-xukuohai@huaweicloud.com/ (local)

Xu Kuohai (4):
    ftrace: Allow users to disable ftrace direct call
    arm64: ftrace: Support long jump for ftrace direct call
    arm64: ftrace: Add ftrace direct call support
    ftrace: Fix dead loop caused by direct call in ftrace selftest
Given there's just a tiny fraction touching BPF JIT and most are around core arm64,
it probably makes sense that this series goes via Catalin/Will through arm64 tree
instead of bpf-next if it looks good to them. Catalin/Will, thoughts (Ack + bpf-next
could work too, but I'd presume this just results in merge conflicts)?
I think it makes sense for the series to go via the arm64 tree but I'd
like Mark to have a look at the ftrace changes first.
quoted
From a quick scan, I still don't think this is quite right, and as it stands I
believe this will break backtracing (as the instructions before the function
entry point will not be symbolized correctly, getting in the way of
RELIABLE_STACKTRACE). I think I was insufficiently clear with my earlier
feedback there, as I have a mechanism in mind that wa a little simpler.
Thanks for the review. I have some thoughts about reliable stacktrace.

If PC is not in the range of literal_call, stacktrace works as before without
changes.

If PC is in the range of literal_call, for example, interrupted by an
irq, I think there are 2 problems:

1. Caller LR is not pushed to the stack yet, so caller's address and name
    will be missing from the backtrace.

2. Since PC is not in func's address range, no symbol name will be found, so
    func name is also missing.

Problem 1 is not introduced by this patchset, but the occurring probability
may be increased by this patchset. I think this problem should be addressed by
a reliable stacktrace scheme, such as ORC on x86.

Problem 2 is indeed introduced by this patchset. I think there are at least 3
ways to deal with it:

1. Add a symbol name for literal_call.

2. Hack the backtrace routine, if no symbol name found for a PC during backtrace,
    we can check if the PC is in literal_call, then adjust PC and try again.

3. Move literal_call to the func's address range, for example:

         a. Compile with -fpatchable-function-entry=7
         func:
                 BTI C
                 NOP
                 NOP
                 NOP
                 NOP
                 NOP
                 NOP
                 NOP
         func_body:
                 ...


         b. When disabled, patch it to
         func:
                 BTI C
                 B func_body
         literal:
                 .quad dummy_tramp
         literal_call:
                 LDR X16, literal
                 MOV X9, LR
                 BLR X16
         func_body:
                 ...


         c. When enabled and target is out-of-range, patch it to
         func:
                 BTI C
                 B literal_call
         literal:
                 .quad custom_trampoline
         literal_call:
                 LDR X16, literal
                 MOV X9, LR
                 BLR X16
         func_body:
                 ...


         d. When enabled and target is in range, patch it to
         func:
                 BTI C
                 B direct_call
         literal:
                 .quad dummy_tramp
                 LDR X16, literal
         direct_call:
                 MOV X9, LR
                 BL custom_trampoline
         func_body:
                 ...

I'll try to reply with some more detail tomorrow, but I don't think this is the
right approach, and as mentioned previously (and e.g. at LPC) I'd strongly
prefer to *not* implement direct calls, so that we can have more consistent
entry/exit handling.

Thanks,
Mark.
.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help