Thread (154 messages) 154 messages, 12 authors, 2023-03-20

Re: [PATCH v7 01/41] Documentation/x86: Add CET shadow stack description

From: "Edgecombe, Rick P" <rick.p.edgecombe@intel.com>
Date: 2023-03-06 18:07:10
Also in: linux-api, linux-arch, linux-mm, lkml

+Kan for shadow stack perf discussion.

On Mon, 2023-03-06 at 16:20 +0000, szabolcs.nagy@arm.com wrote:
The 03/03/2023 22:35, Edgecombe, Rick P wrote:
quoted
I think I slightly prefer the former arch_prctl() based solution
for a
few reasons:
  - When you need to find the start or end of the shadow stack can
you
can just ask for it instead of searching. It can be faster and
simpler.
  - It saves 8 bytes of memory per shadow stack.

If this turns out to be wrong and we want to do the marker solution
much later at some point, the safest option would probably be to
create
new flags.
i see two problems with a get bounds syscall:

- syscall overhead.

- discontinous shadow stack (e.g. alt shadow stack ends with a
  pointer to the interrupted thread shadow stack, so stack trace
  can continue there, except you don't know the bounds of that).
quoted
But just discussing this with HJ, can you share more on what the
usage
is? Like which backtracing operation specifically needs the marker?
How
much does it care about the ucontext case?
it could be an option for perf or ptracers to sample the stack trace.

in-process collection of stack trace for profiling or crash reporting
(e.g. when stack is corrupted) or cross checking stack integrity may
use it too.

sometimes parsing /proc/self/smaps maybe enough, but the idea was to
enable light-weight backtrace collection in an async-signal-safe way.

syscall overhead in case of frequent stack trace collection can be
avoided by caching (in tls) when ssp falls within the thread shadow
stack bounds. otherwise caching does not work as the shadow stack may
be reused (alt shadow stack or ucontext case).

unfortunately i don't know if syscall overhead is actually a problem
(probably not) or if backtrace across signal handlers need to work
with alt shadow stack (i guess it should work for crash reporting).
There was a POC done of perf integration. I'm not too knowledgeable on
perf, but the patch itself didn't need any new shadow stack bounds ABI.
Since it was implemented in the kernel, it could just refer to the
kernel's internal data for the thread's shadow stack bounds.

I asked about ucontext (similar to alt shadow stacks in regards to lack
of bounds ABI), and apparently perf usually focuses on the thread
stacks. Hopefully Kan can lend some more confidence to that assertion.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help