Re: [NEEDS-REVIEW] Re: [PATCH v11 25/25] x86/cet/shstk: Add arch_prctl... | linux-arch

Re: [NEEDS-REVIEW] Re: [PATCH v11 25/25] x86/cet/shstk: Add arch_prctl functions for shadow stack

From: "Andy Lutomirski" <luto@kernel.org>
Date: 2021-09-20 16:51:49
Also in: linux-api, linux-doc, linux-mm, lkml

Possibly related (same subject, not in this thread)

2020-09-15 · Re: [NEEDS-REVIEW] Re: [PATCH v11 25/25] x86/cet/shstk: Add arch_prctl functions for shadow stack · Yu, Yu-cheng <hidden>
2020-09-15 · Re: [NEEDS-REVIEW] Re: [PATCH v11 25/25] x86/cet/shstk: Add arch_prctl functions for shadow stack · Dave Hansen <hidden>
2020-09-15 · Re: [NEEDS-REVIEW] Re: [PATCH v11 25/25] x86/cet/shstk: Add arch_prctl functions for shadow stack · Yu-cheng Yu <hidden>


On Mon, Sep 13, 2021, at 6:33 PM, Edgecombe, Rick P wrote:

On Mon, 2020-09-14 at 11:31 -0700, Andy Lutomirski wrote:

quoted

On Sep 14, 2020, at 7:50 AM, Dave Hansen [off-list ref]
wrote:

On 9/11/20 3:59 PM, Yu-cheng Yu wrote:
...

quoted

Here are the changes if we take the mprotect(PROT_SHSTK)
approach.
Any comments/suggestions?

I still don't like it. :)

I'll also be much happier when there's a proper changelog to
accompany
this which also spells out the alternatives any why they suck so
much.

Let’s take a step back here. Ignoring the precise API, what exactly
is
a shadow stack from the perspective of a Linux user program?

The simplest answer is that it’s just memory that happens to have
certain protections.  This enables all kinds of shenanigans.  A
program could map a memfd twice, once as shadow stack and once as
non-shadow-stack, and change its control flow.  Similarly, a program
could mprotect its shadow stack, modify it, and mprotect it back.  In
some threat models, though could be seen as a WRSS bypass.  (Although
if an attacker can coerce a process to call mprotect(), the game is
likely mostly over anyway.)

But we could be more restrictive, or perhaps we could allow user code
to opt into more restrictions.  For example, we could have shadow
stacks be special memory that cannot be written from usermode by any
means other than ptrace() and friends, WRSS, and actual shadow stack
usage.

What is the goal?

No matter what we do, the effects of calling vfork() are going to be
a
bit odd with SHSTK enabled.  I suppose we could disallow this, but
that seems likely to cause its own issues.

Hi,

Resurrecting this old thread to highlight a consequence of the design
change that came out of it. I am going to be taking over this series
from Yu-cheng, and wanted to check if people would be interested in re-
visiting this interface.

The consequence I wanted to highlight, is that making userspace be
responsible for mapping memory as shadow stack, also requires moving
the writing of the restore token to userspace for glibc ucontext
operations. Since these operations involve creating/pivoting to new
stacks in userspace, ucontext cet support involves also creating a new
shadow stack. For normal thread stacks, the kernel has always done the
shadow stack allocation and so it is never writable (in the normal
sense) from userspace. But after this change makecontext() now first
has to mmap() writable memory, then write the restore token, then
mprotect() it as shadow stack. See the glibc changes to support
PROT_SHADOW_STACK here[0].

The writable window leaves an opening for an attacker to create an
arbitrary shadow stack that could be pivoted to later by tweaking the
ucontext_t structure. To try to see how much this matters, we have done
a small test that uses this window to ROP from writes in another
thread during the makecontext()/setcontext() window. (offensive work
credit to Joao on CC). This would require a real app to already to be
using ucontext in the course of normal runtime.

My general opinion here (take this with a grain of salt -- I haven't paged back in every single detail) is that the kernel should make it straightforward for a libc to do the right thing without nasty races, cross-thread coordination, or unnecessary permission to write to the stack.  I *also* think that it should be possible for userspace to manage its own shadow stack allocation if it wants to, since I'm sure there will be JIT or green thread or other use cases that want to do crazy things that we fail to anticipate with in-kernel magic.

So perhaps we should keep the explicit allocation and free operations, have a way to opt-in to WRSS being flipped on, but also do our best to have API that handle the known cases well.

Does that make sense?  Can we have both approaches work in the same kernel?

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help