Re: [RFC] Objtool toolchain proposal: -fannotate-{jump-table,noreturn}

From: Josh Poimboeuf <jpoimboe@kernel.org>
Date: 2022-09-14 00:04:37
Also in: linux-arm-kernel, linux-toolchains, live-patching, lkml

On Mon, Sep 12, 2022 at 02:17:36PM +0000, Michael Matz wrote:

Hey,

Hi Michael,

Thanks for looking at this.

On Mon, 12 Sep 2022, Borislav Petkov wrote:

quoted

Micha, any opinions on the below are appreciated.

On Fri, Sep 09, 2022 at 11:07:04AM -0700, Josh Poimboeuf wrote:

quoted

difficult to ensure correctness.  Also, due to kernel live patching, the
kernel relies on 100% correctness of unwinding metadata, whereas the
toolchain treats it as a best effort.

Unwinding certainly is not best effort.  It's 100% reliable as far as the 
source language or compilation options require.  But as it doesn't 
touch the discussed features I won't belabor that point.

Ok, maybe I had the wrong impression about the reliability of DWARF.

I will mention that objtool's existence is based on mistrust, of persons 
(not correctly annotating stuff) and of tools (not correctly heeding those 
annotations).  The mistrust in persons is understandable and can be dealt 
with by tools, but the mistrust in tools can't be fixed by making tools 
more complicated by emitting even more information; there's no good reason 
to assume that one piece of info can be trusted more than other pieces.  
So, if you mistrust the tools you have already lost.  That's somewhat 
philosophical, so I won't beat that horse much more either.

Maybe this is semantics, but I wouldn't characterize objtool's existence
as being based on the mistrust of tools.  It's main motivation is to
fill in the toolchain's blind spots in asm and inline-asm, which exist
by design.

(Objtool has actually found many compiler bugs, but that's a side
benefit and not its reason for existence.)

I understand the concern about trusting one piece of info more than
others, but we have to trust the toolchain.  Also, objtool does a lot of
consistency checks, and experience shows that if there's a bug in the
existing jump table or noreturn detection logic, it almost always
quickly surfaces as an objtool warning: unreachable instruction, stack
state mismatch, falling through the end of a function, etc.

Now, recovering the CFG.  I'll switch order of your two items:

2) noreturn function

quoted

  .pushsection .annotate.noreturn
    .quad func1
    .quad func2
    .quad func3
  .popsection

This won't work for indirect calls to noreturn functions:

  void (* __attribute__((noreturn)) noretptr)(void);
  int callnoret (int i)
  {
    noretptr();
    return i + 32;
  }

The return statement is unreachable (and removed by GCC).  To know that 
you would have to mark the call statements, not the individual functions.  
All schemes that mark functions that somehow indicates a meaningful 
difference in the calling sequence (e.g. the ABI of functions) have the 
same problem: it's part of the call expressions type, not of individual 
decls.

Second problem: it's not extensible.  Today it's noreturn functions you 
want to know, and tomorrow?  So, add a flag word per entry, define bit 0 
for now to be NORETURN, and see what comes.  Add a header with a version 
(and/or identifier) as well and it's properly extensible.  For easy 
linking and identifying the blobs in the linked result include a length in 
the header.  If this were in an allocated section it would be a good idea 
to refer to the symbols in a PC-relative manner, so as to not result in 
runtime relocations.  In this case, as it's within a non-alloc section 
that doesn't matter.  So:

.section .annotate.functions
.long 1       # version
.long 0xcafe  # ident
.long 2f-1f   # length
1:
.quad func1, 1   # noreturn
.quad func2, 1   # noreturn
.quad func3, 32  # something_else_but_not_noreturn
...
2:
.long 1b-2b   # align and "checksum"

It might be that the length-and-header scheme is cumbersome if you need to 
write those section commands by hand, in which case another scheme might 
be preferrable, but it should somehow be self-delimiting.

For the above problem of indirect calls to noreturns, instead do:

  .text
  noretcalllabel:
    call noreturn
  othercall:
    call really_special_thing
  .section .annotate.noretcalls
  .quad noretcalllabel, 1  # noreturn call
  .quad othercall, 32      # call to some special(-ABI?) function

Same thoughts re extensibility and self-delimitation apply.

Hm, I didn't know noreturn function pointers were a thing.  Annotating
the call site instead of the function would be fine.

I'm thinking PC-relative relocs are a good idea regardless, it makes the
binary smaller even if the section isn't allocatable.

As far as extending goes, I had been thinking future annotation types
would just go in new sections, e.g. .annotate.retpolinecalls, each
section with its own format.  And that has the benefit of being a
simpler and easier to parse format (no headers, versions, lengths, etc).
But either way is fine I think.

1) jump tables

quoted

Create an .annotate.jump_table section which is an array of the
following variable-length structure:

  struct annotate_jump_table {
	void *indirect_jmp;
	long num_targets;
	void *targets[];
  };

It's very often the case that the compiler already emits what your 
.targets[] member would encode, just at some unknown place, length and 
encoding.  So you would save space if you instead only remember the 
encoding and places of those jump tables:

struct {
  void *indirect_jump;
  long num_tables;
  struct {
    unsigned num_entries;
    unsigned encoding;
    void *start_of_table;
  } tables[];
};

The usual encodings are: direct, PC-relative, relative-to-start-of-table.  
Usually for a specific jump instruction there's only one table, so 
optimizing for that makes sense.  For strange unthought-of cases it's 
probably a good idea to have your initial scheme as fallback, which could 
be indicated by a special .encoding value.

quoted

For example, given the following switch statement code:

  .Lswitch_jmp:
	// %rax is .Lcase_1 or .Lcase_2
	jmp %rax

So, usually %rax would point into a table (somewhere in .rodata/.text) 
that looks like so:

.Ljump_table:
 .quad .Lcase_1 - .Ljump_table
 .quad .Lcase_2 - .Ljump_table

(for position-independend code)

and hence you would emit this as annotation:

.quad .Lswitch_jmp
.quad 1                   # only a single table
.long 2                   # with two entries
.long RELATIVE_TO_START   # all entries are X - start_of_table
.quad .Ljump_table

In this case you won't save anything of course, but as soon as there's a 
meaningful number of cases you will.

As a user of the data, I would prefer a simpler format (something like
my original scheme) which uses more space, rather than needing headers,
fallback scheme, encodings, blob lengths, etc just to save some
non-allocatable bytes.  But the above seems fine.

-- 
Josh

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help