Thread (123 messages) 123 messages, 12 authors, 2018-08-14

Re: [RFC PATCH v2 17/27] x86/cet/shstk: User-mode shadow stack support

From: Yu-cheng Yu <hidden>
Date: 2018-07-13 18:06:48
Also in: linux-api, linux-arch, linux-mm, lkml

On Wed, 2018-07-11 at 15:21 -0700, Andy Lutomirski wrote:
quoted
On Jul 11, 2018, at 2:51 PM, Jann Horn [off-list ref] wrote:

On Wed, Jul 11, 2018 at 2:34 PM Andy Lutomirski [off-list ref] wrote:
quoted
quoted
On Jul 11, 2018, at 2:10 PM, Jann Horn [off-list ref] wrote:
quoted
On Tue, Jul 10, 2018 at 3:31 PM Yu-cheng Yu [off-list ref] wrote:

This patch adds basic shadow stack enabling/disabling routines.
A task's shadow stack is allocated from memory with VM_SHSTK
flag set and read-only protection.  The shadow stack is
allocated to a fixed size.

Signed-off-by: Yu-cheng Yu <redacted>
[...]
quoted
diff --git a/arch/x86/kernel/cet.c b/arch/x86/kernel/cet.c
new file mode 100644
index 000000000000..96bf69db7da7
--- /dev/null
+++ b/arch/x86/kernel/cet.c
[...]
quoted
+static unsigned long shstk_mmap(unsigned long addr, unsigned long len)
+{
+       struct mm_struct *mm = current->mm;
+       unsigned long populate;
+
+       down_write(&mm->mmap_sem);
+       addr = do_mmap(NULL, addr, len, PROT_READ,
+                      MAP_ANONYMOUS | MAP_PRIVATE, VM_SHSTK,
+                      0, &populate, NULL);
+       up_write(&mm->mmap_sem);
+
+       if (populate)
+               mm_populate(addr, populate);
+
+       return addr;
+}
[...]
quoted
quoted
Should the kernel enforce that two shadow stacks must have a guard
page between them so that they can not be directly adjacent, so that
if you have too much recursion, you can't end up corrupting an
adjacent shadow stack?
I think the answer is a qualified “no”. I would like to instead enforce a general guard page on all mmaps that don’t use MAP_FORCE. We *might* need to exempt any mmap with an address hint for
compatibility.
I like this idea a lot.
quoted
My commercial software has been manually adding guard pages on every single mmap done by tcmalloc for years, and it has caught a couple bugs and costs essentially nothing.

Hmm. Linux should maybe add something like Windows’ “reserved” virtual memory. It’s basically a way to ask for a VA range that explicitly contains nothing and can be subsequently be turned into
something useful with the equivalent of MAP_FORCE.
What's the benefit over creating an anonymous PROT_NONE region? That
the kernel won't have to scan through the corresponding PTEs when
tearing down the mapping?
Make it more obvious what’s happening and avoid accounting issues?  What I’ve actually used is MAP_NORESERVE | PROT_NONE, but I think this still counts against the VA rlimit. But maybe that’s
actually the desired behavior.
We can put a NULL at both ends of a SHSTK to guard against corruption.

Yu-cheng 

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help