[PATCH v5 3/5] x86: Split syscall_trace_enter into two phases

[PATCH v5 0/5] x86: two-phase syscall tracing and seccomp fastpath · Andy Lutomirski <luto@amacapital.net> · 2014-09-05
[PATCH v5 1/5] x86,x32,audit: Fix x32's AUDIT_ARCH wrt audit · Andy Lutomirski <luto@amacapital.net> · 2014-09-05
[PATCH v5 2/5] x86,entry: Only call user_exit if TIF_NOHZ · Andy Lutomirski <luto@amacapital.net> · 2014-09-05
[PATCH v5 3/5] x86: Split syscall_trace_enter into two phases · Andy Lutomirski <luto@amacapital.net> · 2014-09-05
Re: [PATCH v5 3/5] x86: Split syscall_trace_enter into two phases · Dmitry V. Levin <hidden> · 2015-02-05
Re: [PATCH v5 3/5] x86: Split syscall_trace_enter into two phases · Kees Cook <hidden> · 2015-02-05
Re: [PATCH v5 3/5] x86: Split syscall_trace_enter into two phases · Dmitry V. Levin <hidden> · 2015-02-05
Re: [PATCH v5 3/5] x86: Split syscall_trace_enter into two phases · Andy Lutomirski <luto@amacapital.net> · 2015-02-05
Re: [PATCH v5 3/5] x86: Split syscall_trace_enter into two phases · Kees Cook <hidden> · 2015-02-05
Re: [PATCH v5 3/5] x86: Split syscall_trace_enter into two phases · Dmitry V. Levin <hidden> · 2015-02-05
Re: [PATCH v5 3/5] x86: Split syscall_trace_enter into two phases · Kees Cook <hidden> · 2015-02-05
Re: [PATCH v5 3/5] x86: Split syscall_trace_enter into two phases · Andy Lutomirski <luto@amacapital.net> · 2015-02-06
Re: [PATCH v5 3/5] x86: Split syscall_trace_enter into two phases · Dmitry V. Levin <hidden> · 2015-02-06
Re: [PATCH v5 3/5] x86: Split syscall_trace_enter into two phases · Andy Lutomirski <luto@amacapital.net> · 2015-02-06
Re: [PATCH v5 3/5] x86: Split syscall_trace_enter into two phases · Kees Cook <hidden> · 2015-02-06
Re: [PATCH v5 3/5] x86: Split syscall_trace_enter into two phases · Andy Lutomirski <luto@amacapital.net> · 2015-02-06
Re: [PATCH v5 3/5] x86: Split syscall_trace_enter into two phases · Kees Cook <hidden> · 2015-02-06
Re: [PATCH v5 3/5] x86: Split syscall_trace_enter into two phases · Andy Lutomirski <luto@amacapital.net> · 2015-02-06
Re: [PATCH v5 3/5] x86: Split syscall_trace_enter into two phases · Kees Cook <hidden> · 2015-02-06
Re: [PATCH v5 3/5] x86: Split syscall_trace_enter into two phases · Andy Lutomirski <luto@amacapital.net> · 2015-02-06
Re: a method to distinguish between syscall-enter/exit-stop · Dmitry V. Levin <hidden> · 2015-02-06
Re: a method to distinguish between syscall-enter/exit-stop · Kees Cook <hidden> · 2015-02-07
Re: a method to distinguish between syscall-enter/exit-stop · Dmitry V. Levin <hidden> · 2015-02-07
Re: [PATCH v5 3/5] x86: Split syscall_trace_enter into two phases · "H. Peter Anvin" <hpa@zytor.com> · 2015-02-06
[PATCH v5 4/5] x86_64,entry: Treat regs->ax the same in fastpath and slowpath syscalls · Andy Lutomirski <luto@amacapital.net> · 2014-09-05
[PATCH v5 5/5] x86_64,entry: Use split-phase syscall_trace_enter for 64-bit syscalls · Andy Lutomirski <luto@amacapital.net> · 2014-09-05
Re: [PATCH v5 0/5] x86: two-phase syscall tracing and seccomp fastpath · Kees Cook <hidden> · 2014-09-08
Re: [PATCH v5 0/5] x86: two-phase syscall tracing and seccomp fastpath · "H. Peter Anvin" <hpa@zytor.com> · 2014-09-08

From: luto@amacapital.net (Andy Lutomirski)
Date: 2015-02-06 02:39:15
Also in: linux-arch, linux-mips, lkml

On Thu, Feb 5, 2015 at 6:32 PM, Dmitry V. Levin [off-list ref] wrote:

On Thu, Feb 05, 2015 at 04:09:06PM -0800, Andy Lutomirski wrote:

quoted

On Thu, Feb 5, 2015 at 3:49 PM, Kees Cook [off-list ref] wrote:

quoted

On Thu, Feb 5, 2015 at 3:39 PM, Dmitry V. Levin [off-list ref] wrote:

[...]

quoted

There is a clear difference: before these changes, SECCOMP_RET_ERRNO used
to keep the syscall number unchanged and suppress syscall-exit-stop event,
which was awful because userspace cannot distinguish syscall-enter-stop
from syscall-exit-stop and therefore relies on the kernel that
syscall-enter-stop is followed by syscall-exit-stop (or tracee's death, etc.).

After these changes, SECCOMP_RET_ERRNO no longer causes syscall-exit-stop
events to be suppressed, but now the syscall number is lost.

Ah-ha! Okay, thanks, I understand now. I think this means seccomp
phase1 should not treat RET_ERRNO as a "skip" event. Andy, what do you
think here?

I still don't quite see how this change caused this.

I have a test for this at
http://sourceforge.net/p/strace/code/ci/HEAD/~/tree/test/seccomp.c

quoted

I can play with
it a bit more.  But RET_ERRNO *has* to be some kind of skip event,
because it needs to skip the syscall.

We could change this by treating RET_ERRNO as an instruction to enter
phase 2 and then asking for a skip in phase 2 without changing
orig_ax, but IMO this is pretty ugly.

I think this all kind of sucks.  We're trying to run ptrace after
seccomp, so ptrace is seeing the syscalls as transformed by seccomp.
That means that if we use RET_TRAP, then ptrace will see the
possibly-modified syscall, if we use RET_ERRNO, then ptrace is (IMO
correctly given the current design) showing syscall -1, and if we use
RET_KILL, then ptrace just sees the process mysteriously die.

Userspace is usually not prepared to see syscall -1.
For example, strace had to be patched, otherwise it just skipped such
syscalls as "not a syscall" events or did other improper things:
http://sourceforge.net/p/strace/code/ci/c3948327717c29b10b5e00a436dc138b4ab1a486
http://sourceforge.net/p/strace/code/ci/8e398b6c4020fb2d33a5b3e40271ebf63199b891

The x32 thing is a legit ABI bug, I'd argue.  I'd be happy to submit a
patch to fix that (clear the x32 bit if we're not x32).

A slightly different but related story: userspace is also not prepared
to handle large errno values produced by seccomp filters like this:
BPF_STMT(BPF_RET, SECCOMP_RET_ERRNO | SECCOMP_RET_DATA)

For example, glibc assumes that syscalls do not return errno values greater than 0xfff:
https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/x86_64/sysdep.h#l55
https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/x86_64/syscall.S#l20

If it isn't too late, I'd recommend changing SECCOMP_RET_DATA mask
applied in SECCOMP_RET_ERRNO case from current 0xffff to 0xfff.

I think this is solidly in the "don't do that" category.  Seccomp lets
you tamper with syscalls.  If you tamper wrong, then you lose.

Kees, what do you think about reversing the whole thing so that
ptracers always see the original syscall?

--Andy

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help