Re: [PATCH v9 23/42] Documentation/x86: Add CET shadow stack description

From: szabolcs.nagy@arm.com <hidden>
Date: 2023-07-03 18:19:51
Also in: linux-arch, linux-doc, linux-mm, lkml

The 07/02/2023 18:03, Edgecombe, Rick P wrote:

On Thu, 2023-06-29 at 17:07 +0100, szabolcs.nagy@arm.com wrote:

quoted

The 06/22/2023 23:18, Edgecombe, Rick P wrote:

quoted

I'd also appreciate if you could spell out exactly which:
 - ucontext
 - signal
 - longjmp
 - custom library stack switching

patterns you think shadow stack should support working together.
Because even after all these mails, I'm still not sure exactly what
you
are trying to achieve.

Hi Szablocs,

Thanks for writing all this up. It is helpful to understand where you
are coming from. Please don't miss my point at the very bottom of this
response.

quoted

i'm trying to support two operations (in any combination):

(1) jump up the current (active) stack.

(2) jump to a live frame in a different inactive but live stack.
    the old stack becomes inactive (= no task executes on it)
    and live (= has valid frames to jump to).

with

(3) the runtime must manage the shadow stacks transparently.
    (= portable c code does not need modifications)

mapping this to c apis:

- swapcontext, setcontext, longjmp, custom stack switching are jump
  operations. (there are conditions under which (1) and (2) must
work,
  further details don't matter.)

- makecontext creates an inactive live stack.

- signal is only special if it executes on an alt stack: on signal
  entry the alt stack becomes active and the interrupted stack
  inactive but live. (nested signals execute on the alt stack until
  that is left either via a jump or signal return.)

- unwinding can be implemented with jump operations (it needs some
  other things but that's out of scope here).

the patterns that shadow stack should support falls out of this
model.
(e.g. posix does not allow jumping from one thread to the stack of a
different thread, but the model does not care about that, it only
cares if the target stack is inactive and live then jump should
work.)

some observations:

- it is necessary for jump to detect case (2) and then switch to the
  target shadow stack. this is also sufficient to implement it.
(note:
  the restore token can be used for detection since that is
guaranteed
  to be present when user code creates an inactive live stack and is
  not present anywhere else by design. a different marking can be
used
  if the inactive live stack is created by the kernel, but then the
  kernel has to provide a switch method, e.g. syscall. this should
not
  be controversial.)

For x86's shadow stack you can jump to a new stack without leaving a
token behind. I don't know if maybe we could make it a rule in the
x86_64 ABI that you should always leave a token if you are going to
mark the SHSTK elf bit. But if anything did this, then longjmp() could
never make it back to the stack where setjmp() was called without
kernel help.

ok. i didn't know that.

the restore token means the stack is valid to execute on later. so
from libc pov if the token is missing then jumping to that stack is
simply ub. (the kernel could help working around missing tokens but
presumably the point of not adding a token is exactly to prevent
any kind of jumps to that stack later).

so i don't think this changes much. (other than the x86 model has
inactive dead stacks that cannot be jumped to, and this is enforced.
but libc jumps are not supposed to leave such dead stacks behind so
libc jumps would have to add the restore token when doing a switch.)

quoted

- in this model two live stacks cannot use the same shadow stack
since
  jumping between the two stacks is allowed in both directions, but
  jumping within a shadow stack only works in one direction. (also
two
  tasks could execute on the same shadow stack then. and it makes
  shadow stack size accounting problematic.)

- so sharing shadow stack with alt stack is broken. (the model is
  right in the sense that valid posix code can trigger the issue.

Could you spell out what "the issue" is that can be triggered?

i meant jumping back from the main to the alt stack:

in main:
setup sig alt stack
setjmp buf1
	raise signal on first return
	longjmp buf2 on second return

in signal handler:
setjmp buf2
	longjmp buf1 on first return
	can continue after second return

in my reading of posix this is valid (and works if signals are masked
such that the alt stack is not clobbered when jumping away from it).

but cannot work with a single shared shadow stack.

quoted

 we
  can ignore that corner case and adjust the model so the shared
  shadow stack works for alt stack, but it likely does not change the
  jump design: eventually we want alt shadow stack.)

As we discussed previously, alt shadow stack can't work transparently
with existing code due to the sigaltstack API. I wonder if maybe you
are trying to get at something else, and I'm not following.

i would like a jump design that works with alt shadow stack.

...

So since they are tied together, and I thought to hold off on it for
now. I don't want to try to squeeze around the upstream userspace, I
think a version 2 should be a clean slate on a new elf bit.

ok.

quoted

i understand that the proposed linux abi makes most existing binaries
with shstk marking work, which is relevant for x86.

for a while i thought we can fix the remaining issues even if that
means breaking existing shstk binaries (just bump the abi marking).
now it seems the issues can only be addressed in a future abi break.

Adding a new arch_prctl() ENABLE value was the plan. Not sure what you
mean by ABI break vs version bump. The plan was to add the new features
without userspace regression by putting any behavior behind a different
enable option. This relies on userspace to add a new elf bit, and to
use it.

yes future abi break was the wrong wording: new abi with new elf bit
is what i meant. i.e. x86 will have an abi v1 and abi v2, where v2
abi is not compatible e.g. with v1 unwinder.

quoted

which means x86 linux will likely end up maintaining two incompatible
abis and the future one will need user code and build system changes,
not just runtime changes. it is not a small incremental change to add
alt shadow stack support for example.

i don't think the maintenance burden of two shadow stack abis is the
right path for arm64 to follow, so the shadow stack semantics will
likely become divergent not common across targets.

Unfortunately we are at a bit of an information asymmetry here because
the ARM spec and patches are not public. It may be part of the cause of
the confusion.

yes that's unfortunate. but in this case i just meant that arm64
does not have existing marked binaries to worry about. so it seems
wrong to do a v1 abi and later a v2 abi to fix that up.

It kind of sounds like you don't like the x86 glibc implementation. And
you want to make sure the kernel can support whatever a new solution is
that you are working on. I am on board with the goal of having some
generic set of rules to make portable code work for other architectures
shadow stacks. But I think how close we can get to that goal or what it
looks like is an open question. For several reasons:
 1. Not everyone can see all the specs
 2. No POCs have been done (or at least shared)
 3. It's not clear what needs to be supported (yes, I know you have 
    made a rough proposal here, but it sounds like on the x86 glibc 
    side at least it's not even clear what non-shadow stack stack 
    switching operations can work together)

But towards these goals, I think your technical requests are:

1. Leave a token on switching to an alt shadow stack. As discussed
earlier, we can't do this because of the overflow issues. Also since,

i don't think the overflow discussion was conclusive.

the kernel could modify the top entry instead of adding a new token.
and provide a syscall to switch to that top entry undoing the
modification.

there might be cornercases and likely not much space for encoding
a return address and special marking in an entry. but otherwise
this makes jumping to an overflowed shadow stack work.

but it is also a valid design to just not support jumping out of alt
stack in the specific case of a shadow stack overflow (treat that as
a fatal error), but still allow the jump in less fatal situations.

alt shadow stack cannot be transparent to existing software anyway, it

maybe not in glibc, but a libc can internally use alt shadow stack
in sigaltstack instead of exposing a separate sigaltshadowstack api.
(this is what a strict posix conform implementation has to do to
support shadow stacks), leaking shadow stacks is not a correctness
issue unless it prevents the program working (the shadow stack for
the main thread likely wastes more memory than all the alt stack
leaks. if the leaks become dominant in a thread the sigaltstack
libc api can just fail).

should be ok to introduce limitations. So I think this one is a no.
What we could do is introduce security weakening kernel helpers, but
this would make sense to come with alt shadow stack support.

yes this would be for abi v2 on x86.

2. Add an end token at the top of the shadow stack. Because of the
existing userspace restriction interactions, this is complicated to
evaluate but I think we *could* do this now. There are pros and cons.

yes, this is a minor point. (and ok to do differently across targets)

3. Support more options for shadow stack sizing. (I think you are
referring to this conversation:
https://lore.kernel.org/lkml/ZAIgrXQ4670gxlE4@arm.com/ (local)). I don't see
why this is needed for the base implementation. If ARM wants to add a
new rlimit or clone variant, I don't see why x86 can't support it
later.

i think it can be added later.

but it may be important for deployment on some platforms, since a
libc (or other language runtime) may want to set the shadow stack
size differently than the kernel default, because

- languages allocating large arrays on the stack
  (too big shadow stack can cause OOM with overcommit off and
  rlimits can be hit like RLIMIT_DATA, RLIMIT_AS because of it)

- tiny thread stack but big sigaltstack (musl libc, go).

So if we add 2, are you satisfied? Or otherwise, on the non-technical
request side, are you asking to hold off on x86 shadow stack, in order
to co-develop a unified solution?

well a unified solution would need a v2 abi with new elf bit,
and it seems the preference is a v1 abi first, so at this point
i'm just trying to understand the potential brokenness in v1
and possible solutions for v2 since if there is a v2 i would
like that to be compatible across targets.

I think the existing solution will see use in the meantime, including
for the development of all the x86 specific JIT implementations.

i think some of that work have to be redone if there is a v2 abi,
which is why i thought having 2 abis is too much maintenance work.

And finally, what I think is the most important point in all of this:

I think that *how* it gets used will be a better guide for further
development than us debating. For example the main pain point that has
come up so far is the problems around dlopen(). And the future work
that has been scoped has been for the kernel to help out in this area.
This is based on _user_ (distro) requests.

i think dlopen (and how it is used) is part of the api/abi design.

in presence of programs that can load any library (e.g. python exe)
it is difficult to make use of shstk without runtime disable.

however if dlopen gets support for runtime disable, reducing the
number of incompatible libraries over time is still relevant,
which requires abi/api design that allows that.

Any apps that don't work with shadow stack limitations can simply not
enable shadow stack. You and me are debating these specific API
combinations, but we can't know whether they are actually the best
place to focus development efforts. And the early signs are this is NOT
the most important problem to solve.

disabling shadow stack is not simple (removing the elf marking from
an exe is often not possible/appropriate, but using an env var that
affects an entire process tree has problems too.) but i don't have
a solution for this (it is likely a userspace issue).

debating the api issues was at least useful for me to understand
what can go into the x86 v1 abi and what may go into a v2 abi.

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help