Thread (15 messages) 15 messages, 4 authors, 2020-07-09

Re: [PATCH] arm64/alternatives: use subsections for replacement sequences

From: Dave Martin <Dave.Martin@arm.com>
Date: 2020-07-07 10:37:32

On Mon, Jul 06, 2020 at 07:04:44PM +0300, Ard Biesheuvel wrote:
On Mon, 6 Jul 2020 at 18:50, Dave Martin [off-list ref] wrote:
quoted
On Wed, Jul 01, 2020 at 07:32:07PM +0200, Ard Biesheuvel wrote:
quoted
On Wed, 1 Jul 2020 at 19:30, Ard Biesheuvel [off-list ref] wrote:
quoted
On Wed, 1 Jul 2020 at 19:01, Dave P Martin [off-list ref] wrote:
quoted
On Tue, Jun 30, 2020 at 10:19:21AM +0200, Ard Biesheuvel wrote:
quoted
When building very large kernels, the logic that emits replacement
sequences for alternatives fails when relative branches are present
in the code that is emitted into the .altinstr_replacement section
and patched in at the original site and fixed up. The reason is that
the linker will insert veneers if relative branches go out of range,
and due to the relative distance of the .altinstr_replacement from
the .text section where its branch targets usually live, veneers
may be emitted at the end of the .altinstr_replacement section, with
the relative branches in the sequence pointed at the veneers instead
of the actual target.

The alternatives patching logic will attempt to fix up the branch to
point to its original target, which will be the veneer in this case,
but given that the patch site is likely to be far away as well, it
will be out of range and so patching will fail. There are other cases
where these veneers are problematic, e.g., when the target of the
branch is in .text while the patch site is in .init.text, in which
case putting the replacement sequence inside .text may not help either.

So let's use subsections to emit the replacement code as closely as
possible to the patch site, to ensure that veneers are only likely to
be emitted if they are required at the patch site as well, in which
case they will be in range for the replacement sequence both before
and after it is transported to the patch site.

This will prevent alternative sequences in non-init code from being
released from memory after boot, but this is tolerable given that the
entire section is only 512 KB on an allyesconfig build (which weighs in
at 500+ MB for the entire Image). Also, note that modules today carry
the replacement sequences in non-init sections as well, and any of
those that target init code will be emitted into init sections after
this change.

This fixes an early crash when booting an allyesconfig kernel on a
system where any of the alternatives sequences containing relative
branches are activated at boot (e.g., ARM64_HAS_PAN on TX2)

Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Andre Przywara <andre.przywara@arm.com>
Cc: Dave P Martin <dave.martin@arm.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/include/asm/alternative.h | 16 ++++++++--------
 arch/arm64/kernel/vmlinux.lds.S      |  3 ---
 2 files changed, 8 insertions(+), 11 deletions(-)
diff --git a/arch/arm64/include/asm/alternative.h b/arch/arm64/include/asm/alternative.h
index 5e5dc05d63a0..12f0eb56a1cc 100644
--- a/arch/arm64/include/asm/alternative.h
+++ b/arch/arm64/include/asm/alternative.h
@@ -73,11 +73,11 @@ static inline void apply_alternatives_module(void *start, size_t length) { }
      ".pushsection .altinstructions,\"a\"\n"                         \
      ALTINSTR_ENTRY(feature)                                         \
      ".popsection\n"                                                 \
-     ".pushsection .altinstr_replacement, \"a\"\n"                   \
+     ".subsection 1\n"                                               \
This uses subsections in existing sections.  Could that interfere with
existing (or future) uses of subsections?  (I've not checked whether
there actually are such uses.  I'm also assuming that clobbering the
invoker's idea of what section is .previous doesn't matter.)
It shouldn't matter, really. You can use different indexes for the
subsection, but since the execution never flows from one subsection
into the next, all that matters is that they are 'somewhere else'

As for the use of .previous - the idea is that this does not affect
the contents of the section stack, which I think makes sense. We could
use '.pushsection .text, 1' as well to enter another subsection in
.text, but that means we keep the .text vs .init.text issue that this
patch solves.
The following works:

        .pushsection junk
        .previous
        .subsection foo
                // ...

        .popsection

though the gas documentation is not very clear about the relationship
between .previous and the section stack directives.  In fact, each stack
slot has its own notion of .previous.  If this trick is too uncertain,
we can probably do without it though.  Relying on .previous after
invoking a macro is ill-advised anyway, and I haven't seen this issue
come up in practice.
There is some mention about .previous only affecting the slot at the
top of the stack, but it is rather vague.
quoted
Since foo is just a number, maybe we could just pick a randomish value,
similarly to the way we "manage" asm local labels generated by macros,
leaving small-integer values reserved for top-level code?  This might
help prevent surprises later on.

In general it's reasonable to use subsections to consolidate things
that should be kept close but otherwise contiguous in the output
object, such as collecting together entries for a jump table.  If a .S
file and macros it calls are both using subsection 1, the macros might
spit out garbage into the middle of the table the .S file is trying to
build for example.  However, I don't see any obvious evidence of that
kind of thing in arch/arm64, and nothing in core code (no surprise
there, this is asm).
All uses of subsections I am aware just use it to emit code out of
line, with a label at the start and a branch at the end, and the only
reason is to remove it from the hot path. For that kind of use, the
only requirement for the subsection index is that it is != 0.

If you are doing smart things with subsections and expect some
consistency in how they are organized at the end of the object file,
you might care about the subsection index, but I don't see a reason to
do so here.
If we used .subsection 77, say, then a conflict would be highly
unlikely.  But I'd agree that this might just cause more confusion in
the long run (if it ever matters).
quoted
quoted
quoted
quoted
Another wrinkle: the replacement code now becomes executable, whereas
I think it was previously in rodata.  I'm not sure how much this
matters, but it might be a source of gadgets.
True. Perhaps we need to get rid of relative branches in alternative
sequences entirely - see below.
quoted
A different option would be to add an explicitly veneered branch macro
for use in alternatives, maybe adrp+add+br.  For BTI compatility, we'd
need a bti j or equivalent at the destination, which might or might not
be easy to achieve -- mind you, I think we theoretically need that
anyway for veneers to work properly in all cases.

Because we would define the exact instruction sequence, the
alternatives code could probably replace it with a direct branch if the
actual destination is close enough.  The downside is that it wouldn't
be a single instruction any more, and there would be some overhead for
conditional branches if we replace the unneeded insns with NOPs.
Yeah, this becomes quite hairy very quickly, especially because now
you need to allocate a spare register each time you do this.
which is not ideal, I agree.

Do we anticipate any truly out-of-range branches, or are we assuming the
kernel text + modules area is always compact enough that direct
branching always works?
There are two realistic failure modes, afaict:
- *Really* large kernels, such as allyesconfig, which is kind of
artificial but still relevant for validation purposes
- Occurrences where the module region is almost exhausted, to the
point where the next module's non-init segment no longer fits, but the
init section does. In this case, any alternative sequences with
branches that need to be patched into the init text section may be out
of range for their targets (which could be actual functions or PLTs
that were emitted into the non-init section)

quoted
quoted
quoted
One other option is to simply disallow branches in the alternative
sequences: I spotted three occurrences [0] that are quite easily
fixed, by inverting the condition so that the relative branch is
emitted in place, and the alternative sequence is just NOPs.
[0] https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/?h=arm64-alt-branches
That might be a nice simplification if there's no significant
performance impact.
Actually, the PAN code could be improved a bit more - I just sent a
patch - but in general, disallowing such branches would be a nice
simplification but I am not sure we can easily enforce it at build
time.
We could scan for problem relocs when linking, but that would introduce
an extra build step.  So I'd guess that wouldn't be popular just for
this.

Cheers
---Dave

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help