Re: [PATCH RFC RFT 2/5] fork: Add shadow stack support to clone3()

From: Mark Brown <broonie@kernel.org>
Date: 2023-10-23 18:32:16
Also in: linux-kselftest, lkml

On Mon, Oct 23, 2023 at 04:32:22PM +0000, Edgecombe, Rick P wrote:

On Mon, 2023-10-23 at 14:20 +0100, Mark Brown wrote:

+Some security folks

I *think* I captured everyone for future versions but I might've missed
some, it's a long Cc list.

quoted

Add parameters to clone3() specifying the address and size of a
shadow
stack for the newly created process, we validate that the range
specified
is accessible to userspace but do not validate that it has been
mapped
appropriately for use as a shadow stack (normally via
map_shadow_stack()).
If the shadow stack is specified in this way then the caller is
responsible
for freeing the memory as with the main stack. If no shadow stack is
specified then the existing implicit allocation and freeing behaviour
is
maintained.

This will give userspace new powers, very close to a "set SSP" ability.
They could start a new thread on an active shadow stack, corrupt it,
etc.

That's true.

One way to avoid this would be to have shstk_alloc_thread_stack()
consume a token on the shadow stack passed in the clone args. But it's
tricky because there is not a CMPXCHG, on x86 at least, that works with
shadow stack accesses. So the kernel would probably have to GUP the
page and do a normal CMPXCHG off of the direct map.

That said, it's already possible to get two threads on the same shadow
stack by unmapping one and mapping another shadow stack in the same
place, while the target thread is not doing a call/ret. I don't know if
there is anything we could do about that without serious compatibility
restrictions. But this patch would make it a bit more trivial.

I might lean towards the token solution, even if it becomes more heavy
weight to use clone3 in this way. It depends on whether the above is
worth defending.

Right.  We're already adding the cost of the extra map_shadow_stack() so
it doesn't seem that out of scope.  We could also allow clone3() to be
used for allocation, potentially removing the ability to specify the
address entirely and only specifying the size.  I did consider that
option but it felt awkward in the API, though equally the whole shadow
stack allocation thing is a bit that way.  That would avoid concerns
about placing and validating tokens entirely but gives less control to
userspace.

This also doesn't do anything to stop anyone trying to allocate sub page
shadow stacks if they're trying to save memory with all the lack of
overrun protection that implies, though that seems to me to be much more
of a deliberate decision that people might make, a token would prevent
that too unless write access to the shadow stack is enabled.

quoted

+               /*
+                * This doesn't validate that the addresses are
mapped
+                * VM_SHADOW_STACK, just that they're mapped at all.
+                */

It just checks the range, right?

Yes, same check as for the normal stack.

Attachments

signature.asc [application/pgp-signature] 488 bytes

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help