Re: [PATCH RFT v3 0/5] fork: Support shadow stacks in clone3()

From: Szabolcs Nagy <hidden>
Date: 2023-11-21 12:22:17
Also in: linux-kselftest, lkml

The 11/21/2023 11:17, Christian Brauner wrote:

On Mon, Nov 20, 2023 at 11:54:28PM +0000, Mark Brown wrote:

quoted

The kernel has recently added support for shadow stacks, currently
x86 only using their CET feature but both arm64 and RISC-V have
equivalent features (GCS and Zicfiss respectively), I am actively
working on GCS[1].  With shadow stacks the hardware maintains an
additional stack containing only the return addresses for branch
instructions which is not generally writeable by userspace and ensures
that any returns are to the recorded addresses.  This provides some
protection against ROP attacks and making it easier to collect call
stacks.  These shadow stacks are allocated in the address space of the
userspace process.

Our API for shadow stacks does not currently offer userspace any
flexiblity for managing the allocation of shadow stacks for newly
created threads, instead the kernel allocates a new shadow stack with
the same size as the normal stack whenever a thread is created with the
feature enabled.  The stacks allocated in this way are freed by the
kernel when the thread exits or shadow stacks are disabled for the
thread.  This lack of flexibility and control isn't ideal, in the vast
majority of cases the shadow stack will be over allocated and the
implicit allocation and deallocation is not consistent with other
interfaces.  As far as I can tell the interface is done in this manner
mainly because the shadow stack patches were in development since before
clone3() was implemented.

Since clone3() is readily extensible let's add support for specifying a
shadow stack when creating a new thread or process in a similar manner

So while I made clone3() readily extensible I don't want it to ever
devolve into a fancier version of a prctl().

I would really like to see a strong reason for allowing userspace to
configure the shadow stack size at this point in time.

I have a few questions that are probably me just not knowing much about
shadow stacks so hopefully I'm not asking you write a thesis by
accident:

(1) What does it mean for a shadow stack to be over allocated and is
    over-allocation really that much of a problem out in the wild that
    we need to give I userspace a knob to control a kernel security
    feature?

over-allocation: allocating 100M shadow stack (RLIMIT_DATA, RLIMIT_AS)
for a thread that keeps arrays of data on the stack, instead of say 8k,
and thus running into resource limits.

under-allocation: small thread stack with runaway recursion, but large
sigaltstack: the stack overflow handler can run out of space because it
uses the same shadow stack as the thread.

(2) With what other interfaces is implicit allocation and deallocation
    not consistent? I don't understand this argument. The kernel creates
    a shadow stack as a security measure to store return addresses. It
    seems to me exactly that the kernel should implicitly allocate and
    deallocate the shadow stack and not have userspace muck around with
    its size?

the kernel is not supposed to impose stack size policy or a particular
programming model that limits the stack management options nor prevent
the handling of stack overflows.

(3) Why is it safe for userspace to request the shadow stack size? What
    if they request a tiny shadow stack size? Should this interface
    require any privilege?

user can allocate huge or tiny stacks already.

and i think userspace can take control over shadow stack management:
it can disable signals, start a clone child with stack_size == 1 page,
map_shadow_stack and switch to it, enable signals. however this is
complicated, leaks 1 page of kernel allocated shadow stack (+reserved
guard page, i guess userspace could unmap, not sure if that works
currently) and requires additional syscalls.

(4) Why isn't the @stack_size argument I added for clone3() enough?
    If it is specified can't the size of the shadow stack derived from it?

shadow stack only contains return addresses so it is proportional
to the number of stack frames, not the stack size and it must
account for sigaltstack too, not just the thread stack.

if you make minimal assumptions about stack usage and ignore the
sigaltstack issue then the worst case shadow stack requirement
is indeed proportional to the stack_size, but this upper bound
can be pessimistic and userspace knows the tradeoffs better.

And my current main objection is that shadow stacks were just released
to userspace. There can't be a massive amount of users yet - outside of
maybe early adopters.

no upstream libc has code to enable shadow stacks at this point
so there are exactly 0 users in the open. (this feature requires
runtime support)

the change is expected to allow wider deployability. (e.g. not
just in glibc)

The fact that there are other architectures that bring in a similar
feature makes me even more hesitant. If they have all agreed _and_
implemented shadow stacks and have unified semantics then we can
consider exposing control knobs to userspace that aren't implicitly
architecture specific currently.

So I don't have anything against the patches per obviously but with the
wider context.

Thanks!

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help