Re: [RFC] Objtool toolchain proposal: -fannotate-{jump-table,noreturn}
From: Josh Poimboeuf <jpoimboe@kernel.org>
Date: 2022-09-14 00:04:37
Also in:
linux-arm-kernel, linux-toolchains, live-patching, lkml
On Mon, Sep 12, 2022 at 02:17:36PM +0000, Michael Matz wrote:
Hey,
Hi Michael, Thanks for looking at this.
On Mon, 12 Sep 2022, Borislav Petkov wrote:quoted
Micha, any opinions on the below are appreciated. On Fri, Sep 09, 2022 at 11:07:04AM -0700, Josh Poimboeuf wrote:quoted
quoted
difficult to ensure correctness. Also, due to kernel live patching, the kernel relies on 100% correctness of unwinding metadata, whereas the toolchain treats it as a best effort.Unwinding certainly is not best effort. It's 100% reliable as far as the source language or compilation options require. But as it doesn't touch the discussed features I won't belabor that point.
Ok, maybe I had the wrong impression about the reliability of DWARF.
I will mention that objtool's existence is based on mistrust, of persons (not correctly annotating stuff) and of tools (not correctly heeding those annotations). The mistrust in persons is understandable and can be dealt with by tools, but the mistrust in tools can't be fixed by making tools more complicated by emitting even more information; there's no good reason to assume that one piece of info can be trusted more than other pieces. So, if you mistrust the tools you have already lost. That's somewhat philosophical, so I won't beat that horse much more either.
Maybe this is semantics, but I wouldn't characterize objtool's existence as being based on the mistrust of tools. It's main motivation is to fill in the toolchain's blind spots in asm and inline-asm, which exist by design. (Objtool has actually found many compiler bugs, but that's a side benefit and not its reason for existence.) I understand the concern about trusting one piece of info more than others, but we have to trust the toolchain. Also, objtool does a lot of consistency checks, and experience shows that if there's a bug in the existing jump table or noreturn detection logic, it almost always quickly surfaces as an objtool warning: unreachable instruction, stack state mismatch, falling through the end of a function, etc.
Now, recovering the CFG. I'll switch order of your two items: 2) noreturn functionquoted
quoted
.pushsection .annotate.noreturn .quad func1 .quad func2 .quad func3 .popsectionThis won't work for indirect calls to noreturn functions: void (* __attribute__((noreturn)) noretptr)(void); int callnoret (int i) { noretptr(); return i + 32; } The return statement is unreachable (and removed by GCC). To know that you would have to mark the call statements, not the individual functions. All schemes that mark functions that somehow indicates a meaningful difference in the calling sequence (e.g. the ABI of functions) have the same problem: it's part of the call expressions type, not of individual decls. Second problem: it's not extensible. Today it's noreturn functions you want to know, and tomorrow? So, add a flag word per entry, define bit 0 for now to be NORETURN, and see what comes. Add a header with a version (and/or identifier) as well and it's properly extensible. For easy linking and identifying the blobs in the linked result include a length in the header. If this were in an allocated section it would be a good idea to refer to the symbols in a PC-relative manner, so as to not result in runtime relocations. In this case, as it's within a non-alloc section that doesn't matter. So: .section .annotate.functions .long 1 # version .long 0xcafe # ident .long 2f-1f # length 1: .quad func1, 1 # noreturn .quad func2, 1 # noreturn .quad func3, 32 # something_else_but_not_noreturn ... 2: .long 1b-2b # align and "checksum" It might be that the length-and-header scheme is cumbersome if you need to write those section commands by hand, in which case another scheme might be preferrable, but it should somehow be self-delimiting. For the above problem of indirect calls to noreturns, instead do: .text noretcalllabel: call noreturn othercall: call really_special_thing .section .annotate.noretcalls .quad noretcalllabel, 1 # noreturn call .quad othercall, 32 # call to some special(-ABI?) function Same thoughts re extensibility and self-delimitation apply.
Hm, I didn't know noreturn function pointers were a thing. Annotating the call site instead of the function would be fine. I'm thinking PC-relative relocs are a good idea regardless, it makes the binary smaller even if the section isn't allocatable. As far as extending goes, I had been thinking future annotation types would just go in new sections, e.g. .annotate.retpolinecalls, each section with its own format. And that has the benefit of being a simpler and easier to parse format (no headers, versions, lengths, etc). But either way is fine I think.
1) jump tablesquoted
quoted
Create an .annotate.jump_table section which is an array of the following variable-length structure: struct annotate_jump_table { void *indirect_jmp; long num_targets; void *targets[]; };It's very often the case that the compiler already emits what your .targets[] member would encode, just at some unknown place, length and encoding. So you would save space if you instead only remember the encoding and places of those jump tables: struct { void *indirect_jump; long num_tables; struct { unsigned num_entries; unsigned encoding; void *start_of_table; } tables[]; }; The usual encodings are: direct, PC-relative, relative-to-start-of-table. Usually for a specific jump instruction there's only one table, so optimizing for that makes sense. For strange unthought-of cases it's probably a good idea to have your initial scheme as fallback, which could be indicated by a special .encoding value.quoted
quoted
For example, given the following switch statement code: .Lswitch_jmp: // %rax is .Lcase_1 or .Lcase_2 jmp %raxSo, usually %rax would point into a table (somewhere in .rodata/.text) that looks like so: .Ljump_table: .quad .Lcase_1 - .Ljump_table .quad .Lcase_2 - .Ljump_table (for position-independend code) and hence you would emit this as annotation: .quad .Lswitch_jmp .quad 1 # only a single table .long 2 # with two entries .long RELATIVE_TO_START # all entries are X - start_of_table .quad .Ljump_table In this case you won't save anything of course, but as soon as there's a meaningful number of cases you will.
As a user of the data, I would prefer a simpler format (something like my original scheme) which uses more space, rather than needing headers, fallback scheme, encodings, blob lengths, etc just to save some non-allocatable bytes. But the above seems fine. -- Josh