Re: [PATCH v3 00/36] arm64/gcs: Provide support for GCS in userspace

From: Will Deacon <will@kernel.org>
Date: 2023-08-10 09:40:28
Also in: kvmarm, linux-arm-kernel, linux-doc, linux-fsdevel, linux-kselftest, linux-mm, linux-riscv, lkml

On Tue, Aug 08, 2023 at 09:25:11PM +0100, Mark Brown wrote:

On Tue, Aug 08, 2023 at 02:38:58PM +0100, Will Deacon wrote:

quoted

But seriously, I think the question is more about what this brings us
*on top of* SCS, since for the forseeable future folks that care about
this stuff (like Android) will be using SCS. GCS on its own doesn't make
sense to me, given the recompilation effort to remove SCS and the lack
of hardware, so then you have to look at what it brings in addition to
GCS and balance that against the performance cost.

quoted

Given that, is anybody planning to ship a distribution with this enabled?

I'm not sure that your assumption that the only people would would
consider deploying this are those who have deployed SCS is a valid one,
SCS users are definitely part of the mix but GCS is expected to be much
more broadly applicable.  As you say SCS is very invasive, requires a
rebuild of everything with different code generated and as Szabolcs
outlined has ABI challenges for general distros.  Any code built (or
JITed) with anything other than clang is going to require some explicit
support to do SCS (eg, the kernel's SCS support does nothing for
assembly code) and there's a bunch of runtime support.  It's very much a
specialist feature, mainly practical in well controlled somewhat
vertical systems - I've not seen any suggestion that general purpose
distros are considering using it.

I've also seen no suggestion that general purpose distros are considering
GCS -- that's what I'm asking about here, and also saying that we shouldn't
rush in an ABI without confidence that it actually works beyond unit tests
(although it's great that you wrote selftests!).

In contrast in the case of GCS one of the nice features is that for most
code it's very much non-invasive, much less so than things like PAC/BTI
and SCS, which means that the audience is much wider than it is for SCS
- it's a *much* easier sell for general purpose distros to enable GCS
than to enable SCS.

This sounds compelling, but has anybody tried running significant parts of a
distribution (e.g. running Debian source package tests, booting Android,
using a browser, running QEMU) with GCS enabled? I can well imagine
non-trivial applications violating both assumptions of the architecture and
the ABI.

For the majority of programs all the support that is needed is in the
kernel and libgcc/libc, there's no impact on the code generation.  There
are no extra instructions in the normal flow which will impact systems
without the feature, and there are no extra registers in use, so even if
the binaries are run on a system without GCS or for some reason someone
decides that it's best to turn the feature off on a system that is capable
of using it the fact that it's just using the existing bl/ret pairs means
that there is minimal overhead.  This all means that it's much more
practical to deploy in general purpose distros.  On the other hand when
active it affects all code, this improves coverage but the improved
coverage can be a worry.

I can see that systems that have gone through all the effort of enabling
SCS might not rush to implement GCS, though there should be no harm in
having the two features running side by side beyond the doubled memory
requirements so you can at least have a transition plan (GCS does have
some allowances which enable hardware to mitigate some of the memory
bandwidth requirements at least).  You do still get the benefit of the
additional hardware protections GCS offers, and the coverage of all
branch and ret instructions will be of interest both for security and
for unwinders.  It's definitely offers less of an incremental
improvement on top of SCS than it is without SCS though.

GCS and SCS are comparable features in terms of the protection they aim
to add but their system integration impacts are different.

Again, this sounds plausible but I don't see any data to back it up so I
don't really have a feeling as to how true it is.

quoted

If not, why are we bothering? If so, how much of that distribution has
been brought up and how does the "dynamic linker or other startup code"
decide what to do?

There is active interest in the x86 shadow stack support from distros,
GCS is a lot earlier on in the process but isn't fundamentally different
so it is expected that this will translate.  There is also a chicken and
egg thing where upstream support gates a lot of people's interest, what
people will consider carrying out of tree is different to what they'll
enable.

I'm not saying we should wait until distros are committed, but Arm should
be able to do that work on a fork, exactly like we did for the arm64
bringup. We have the fastmodel, so running interesting stuff with GCS
enabled should be dead easy, no?

Architecture specific feedback on the implementation can also be fed back
into the still ongoing review of the ABI that is being established for
x86, there will doubtless be pushback about variations between
architectures from userspace people.

The userspace decision about enablement will primarily be driven by an
ELF marking which the dynamic linker looks at to determine if the
binaries it is loading can support GCS, a later dlopen() can either
refuse to load an additional library if the process currently has GCS
enabled, ignore the issue and hope things work out (there's a good
chance they will but obviously that's not safe) or (more complicatedly)
go round all the threads and disable GCS before proceeding.  The main
reason any sort of rebuild is required for most code is to add the ELF
marking, there will be a compiler option to select it.  Static binaries
should know if everything linked into them is GCS compatible and enable
GCS if appropriate in their startup code.

The majority of the full distro work at this point is on the x86 side
given the hardware availability, we are looking at that within Arm of
course.  I'm not aware of any huge blockers we have encountered thus
far.

Ok, so it sounds like you've started something then? How far have you got?

It is fair to say that there's less active interest on the arm64 side
since as you say the feature is quite a way off making it's way into
hardware, though there are also long lead times on getting the full
software stack to end users and kernel support becomes a blocker for
the userspace stack.

quoted

After the mess we had with BTI and mprotect(), I'm hesitant to merge
features like this without knowing that the ABI can stand real code.

The equivalent x86 feature is in current hardware[1], there has been
some distro work (I believe one of the issues x86 has had is coping with
a distro which shipped an early out of tree ABI, that experience has
informed the current ABI which as the cover letter says we are following
closely).  AIUI the biggest blocker on userspace work for x86 right now
is landing the kernel side of things so that everyone else has a stable
ABI to work from and don't need to carry out of tree patches, I've heard
frustration expressed at the deployment being held up.  IIRC Fedora were
on the leading edge in terms of active interest, they tend to be given
that they're one of the most quickly iterating distros.  

This definitely does rely fairly heavily on the x86 experience for
confidence in the ABI, and to be honest one of the big unknowns at this
point is if you or Catalin will have opinions on how things are being
done.

While we'd be daft not to look at what the x86 folks are doing, I don't
think we should rely solely on them to inform the design for arm64 when
it should be relatively straightforward to prototype the distro work on
the model. There's also no rush to land the kernel changes given that
GCS hardware doesn't exist.

Will

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help