Re: [PATCH 2/2] x86/retpoline,kprobes: Avoid treating rethunk as an indirect jump
From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Date: 2023-07-07 14:39:27
Also in:
lkml
On Thu, 6 Jul 2023 13:34:03 +0200 Peter Zijlstra [off-list ref] wrote:
On Thu, Jul 06, 2023 at 06:00:14PM +0900, Masami Hiramatsu wrote:quoted
On Thu, 6 Jul 2023 09:17:05 +0200 Peter Zijlstra [off-list ref] wrote:quoted
On Thu, Jul 06, 2023 at 09:47:23AM +0900, Masami Hiramatsu wrote:quoted
quoted
quoted
If I understand correctly, all indirect jump will be replaced with JMP_NOSPEC. If you read the insn_jump_into_range, I onlu jecks the jump code, not call. So the functions only have indirect call still allow optprobe.With the introduction of kCFI JMP_NOSPEC is no longer an equivalent to a C indirect jump.If I understand correctly, kCFI is enabled by CFI_CLANG, and clang is not using jump-tables by default, so we can focus on gcc. In that case current check still work, correct?IIRC clang can use jump tables, but like GCC needs RETPOLINE=n and IBT=n, so effectively nobody has them.So if it requires RETPOLINE=n, current __indirect_thunk_start/end checking is not required, right? (that code is embraced with "#ifdef CONFIG_RETPOLINE")Correct.quoted
quoted
The reason I did mention kCFI though is that kCFI has a larger 'indirect jump' sequence, and I'm not sure we've thought about what can go sideways if that's optprobed.If I understand correctly, kCFI checks only indirect function call (check pointer), so no jump tables. Or does it use indirect 'jump' ?Yes, it's indirect function calls only. Imagine our function (bar) doing an indirect call, it will (as clang always does) have the function pointer in r11: bar: ... movl $(-0x12345678),%r10d addl -15(%r11), %r10d je 1f ud2 1: call __x86_indirect_thunk_r11 And then the function it calls (foo) looks like: __cfi_foo: movl $0x12345678, %eax .skip 11, 0x90 foo: endbr .... So if the caller (in bar) and the callee (foo) have the same hash value (0x12345678 in this case) then it will be equal and we continue on our merry way. However, if they do not match, we'll trip that #UD and the handle_cfi_failure() will try and match the address to __{start,stop}__kcfi_traps[]. Additinoally decode_cfi_insn() will try and decode that whole call sequence in order to obtain the target address and typeid (hash).
Thank you for the explanation! This helps me!
optprobes might disturb this code.
So either optprobe or kprobes (any text instrumentation) do not touch __cfi_FUNC symbols light before FUNC.
quoted
quoted
I suspect the UD2 that's in there will go 'funny' if it's relocated into an optprobe, as in, it'll not be recognised as a CFI fail.UD2 can't be optprobed (kprobe neither) because it can change the dumped BUG address...Right, same problem here. But could the movl/addl be opt-probed? That would wreck decode_cfi_insn(). Then again, if decode_cfi_insn() fails, we'll get report_cfi_failure_noaddr(), which is less informative.
Ok, so if that sequence is always expected, I can also prohibit probing it. Or, maybe it is better to generalize the API to access original instruction which is used from kprobes, so that decode_cfi_insn() can get the original (non-probed) insn.
So it looks like nothing too horrible happens...
Thank you, -- Masami Hiramatsu (Google) [off-list ref]