Thread (16 messages) 16 messages, 3 authors, 2023-07-09

Re: [PATCH 2/2] x86/retpoline,kprobes: Avoid treating rethunk as an indirect jump

From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Date: 2023-07-07 14:39:27
Also in: lkml

On Thu, 6 Jul 2023 13:34:03 +0200
Peter Zijlstra [off-list ref] wrote:
On Thu, Jul 06, 2023 at 06:00:14PM +0900, Masami Hiramatsu wrote:
quoted
On Thu, 6 Jul 2023 09:17:05 +0200
Peter Zijlstra [off-list ref] wrote:
quoted
On Thu, Jul 06, 2023 at 09:47:23AM +0900, Masami Hiramatsu wrote:
quoted
quoted
quoted
If I understand correctly, all indirect jump will be replaced with JMP_NOSPEC.
If you read the insn_jump_into_range, I onlu jecks the jump code, not call.
So the functions only have indirect call still allow optprobe.
With the introduction of kCFI JMP_NOSPEC is no longer an equivalent to a
C indirect jump.
If I understand correctly, kCFI is enabled by CFI_CLANG, and clang is not
using jump-tables by default, so we can focus on gcc. In that case
current check still work, correct?
IIRC clang can use jump tables, but like GCC needs RETPOLINE=n and
IBT=n, so effectively nobody has them.
So if it requires RETPOLINE=n, current __indirect_thunk_start/end checking
is not required, right? (that code is embraced with "#ifdef CONFIG_RETPOLINE")
Correct.
quoted
quoted
The reason I did mention kCFI though is that kCFI has a larger 'indirect
jump' sequence, and I'm not sure we've thought about what can go
sideways if that's optprobed.
If I understand correctly, kCFI checks only indirect function call (check
pointer), so no jump tables. Or does it use indirect 'jump' ?
Yes, it's indirect function calls only.

Imagine our function (bar) doing an indirect call, it will (as clang
always does) have the function pointer in r11:

bar:
	...
	movl	$(-0x12345678),%r10d
	addl	-15(%r11), %r10d
	je	1f
	ud2
1:	call	__x86_indirect_thunk_r11



And then the function it calls (foo) looks like:

__cfi_foo:
	movl	$0x12345678, %eax
	.skip	11, 0x90
foo:
	endbr
	....



So if the caller (in bar) and the callee (foo) have the same hash value
(0x12345678 in this case) then it will be equal and we continue on our
merry way.

However, if they do not match, we'll trip that #UD and the
handle_cfi_failure() will try and match the address to
__{start,stop}__kcfi_traps[]. Additinoally decode_cfi_insn() will try
and decode that whole call sequence in order to obtain the target
address and typeid (hash).
Thank you for the explanation! This helps me!
optprobes might disturb this code.
So either optprobe or kprobes (any text instrumentation) do not touch
__cfi_FUNC symbols light before FUNC.
quoted
quoted
I suspect the UD2 that's in there will go 'funny' if it's relocated into
an optprobe, as in, it'll not be recognised as a CFI fail.
UD2 can't be optprobed (kprobe neither) because it can change the dumped
BUG address...
Right, same problem here. But could the movl/addl be opt-probed? That
would wreck decode_cfi_insn(). Then again, if decode_cfi_insn() fails,
we'll get report_cfi_failure_noaddr(), which is less informative.
Ok, so if that sequence is always expected, I can also prohibit probing it.
Or, maybe it is better to generalize the API to access original instruction
which is used from kprobes, so that decode_cfi_insn() can get the original
(non-probed) insn.
So it looks like nothing too horrible happens...

Thank you,

-- 
Masami Hiramatsu (Google) [off-list ref]
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help