[PATCH v5 3/5] x86: Split syscall_trace_enter into two phases

[PATCH v5 0/5] x86: two-phase syscall tracing and seccomp fastpath · Andy Lutomirski <luto@amacapital.net> · 2014-09-05
[PATCH v5 1/5] x86,x32,audit: Fix x32's AUDIT_ARCH wrt audit · Andy Lutomirski <luto@amacapital.net> · 2014-09-05
[PATCH v5 2/5] x86,entry: Only call user_exit if TIF_NOHZ · Andy Lutomirski <luto@amacapital.net> · 2014-09-05
[PATCH v5 3/5] x86: Split syscall_trace_enter into two phases · Andy Lutomirski <luto@amacapital.net> · 2014-09-05
Re: [PATCH v5 3/5] x86: Split syscall_trace_enter into two phases · Dmitry V. Levin <hidden> · 2015-02-05
Re: [PATCH v5 3/5] x86: Split syscall_trace_enter into two phases · Kees Cook <hidden> · 2015-02-05
Re: [PATCH v5 3/5] x86: Split syscall_trace_enter into two phases · Dmitry V. Levin <hidden> · 2015-02-05
Re: [PATCH v5 3/5] x86: Split syscall_trace_enter into two phases · Andy Lutomirski <luto@amacapital.net> · 2015-02-05
Re: [PATCH v5 3/5] x86: Split syscall_trace_enter into two phases · Kees Cook <hidden> · 2015-02-05
Re: [PATCH v5 3/5] x86: Split syscall_trace_enter into two phases · Dmitry V. Levin <hidden> · 2015-02-05
Re: [PATCH v5 3/5] x86: Split syscall_trace_enter into two phases · Kees Cook <hidden> · 2015-02-05
Re: [PATCH v5 3/5] x86: Split syscall_trace_enter into two phases · Andy Lutomirski <luto@amacapital.net> · 2015-02-06
Re: [PATCH v5 3/5] x86: Split syscall_trace_enter into two phases · Dmitry V. Levin <hidden> · 2015-02-06
Re: [PATCH v5 3/5] x86: Split syscall_trace_enter into two phases · Andy Lutomirski <luto@amacapital.net> · 2015-02-06
Re: [PATCH v5 3/5] x86: Split syscall_trace_enter into two phases · Kees Cook <hidden> · 2015-02-06
Re: [PATCH v5 3/5] x86: Split syscall_trace_enter into two phases · Andy Lutomirski <luto@amacapital.net> · 2015-02-06
Re: [PATCH v5 3/5] x86: Split syscall_trace_enter into two phases · Kees Cook <hidden> · 2015-02-06
Re: [PATCH v5 3/5] x86: Split syscall_trace_enter into two phases · Andy Lutomirski <luto@amacapital.net> · 2015-02-06
Re: [PATCH v5 3/5] x86: Split syscall_trace_enter into two phases · Kees Cook <hidden> · 2015-02-06
Re: [PATCH v5 3/5] x86: Split syscall_trace_enter into two phases · Andy Lutomirski <luto@amacapital.net> · 2015-02-06
Re: a method to distinguish between syscall-enter/exit-stop · Dmitry V. Levin <hidden> · 2015-02-06
Re: a method to distinguish between syscall-enter/exit-stop · Kees Cook <hidden> · 2015-02-07
Re: a method to distinguish between syscall-enter/exit-stop · Dmitry V. Levin <hidden> · 2015-02-07
Re: [PATCH v5 3/5] x86: Split syscall_trace_enter into two phases · "H. Peter Anvin" <hpa@zytor.com> · 2015-02-06
[PATCH v5 4/5] x86_64,entry: Treat regs->ax the same in fastpath and slowpath syscalls · Andy Lutomirski <luto@amacapital.net> · 2014-09-05
[PATCH v5 5/5] x86_64,entry: Use split-phase syscall_trace_enter for 64-bit syscalls · Andy Lutomirski <luto@amacapital.net> · 2014-09-05
Re: [PATCH v5 0/5] x86: two-phase syscall tracing and seccomp fastpath · Kees Cook <hidden> · 2014-09-08
Re: [PATCH v5 0/5] x86: two-phase syscall tracing and seccomp fastpath · "H. Peter Anvin" <hpa@zytor.com> · 2014-09-08

From: Dmitry V. Levin <hidden>
Date: 2015-02-06 02:32:56
Also in: linux-arch, linux-mips, lkml

On Thu, Feb 05, 2015 at 04:09:06PM -0800, Andy Lutomirski wrote:

On Thu, Feb 5, 2015 at 3:49 PM, Kees Cook [off-list ref] wrote:

quoted

On Thu, Feb 5, 2015 at 3:39 PM, Dmitry V. Levin [off-list ref] wrote:

[...]

quoted

There is a clear difference: before these changes, SECCOMP_RET_ERRNO used
to keep the syscall number unchanged and suppress syscall-exit-stop event,
which was awful because userspace cannot distinguish syscall-enter-stop
from syscall-exit-stop and therefore relies on the kernel that
syscall-enter-stop is followed by syscall-exit-stop (or tracee's death, etc.).

After these changes, SECCOMP_RET_ERRNO no longer causes syscall-exit-stop
events to be suppressed, but now the syscall number is lost.

Ah-ha! Okay, thanks, I understand now. I think this means seccomp
phase1 should not treat RET_ERRNO as a "skip" event. Andy, what do you
think here?

I still don't quite see how this change caused this.

I have a test for this at
http://sourceforge.net/p/strace/code/ci/HEAD/~/tree/test/seccomp.c

I can play with
it a bit more.  But RET_ERRNO *has* to be some kind of skip event,
because it needs to skip the syscall.

We could change this by treating RET_ERRNO as an instruction to enter
phase 2 and then asking for a skip in phase 2 without changing
orig_ax, but IMO this is pretty ugly.

I think this all kind of sucks.  We're trying to run ptrace after
seccomp, so ptrace is seeing the syscalls as transformed by seccomp.
That means that if we use RET_TRAP, then ptrace will see the
possibly-modified syscall, if we use RET_ERRNO, then ptrace is (IMO
correctly given the current design) showing syscall -1, and if we use
RET_KILL, then ptrace just sees the process mysteriously die.

Userspace is usually not prepared to see syscall -1.
For example, strace had to be patched, otherwise it just skipped such
syscalls as "not a syscall" events or did other improper things:
http://sourceforge.net/p/strace/code/ci/c3948327717c29b10b5e00a436dc138b4ab1a486
http://sourceforge.net/p/strace/code/ci/8e398b6c4020fb2d33a5b3e40271ebf63199b891

A slightly different but related story: userspace is also not prepared
to handle large errno values produced by seccomp filters like this:
BPF_STMT(BPF_RET, SECCOMP_RET_ERRNO | SECCOMP_RET_DATA)

For example, glibc assumes that syscalls do not return errno values greater than 0xfff:
https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/x86_64/sysdep.h#l55
https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/x86_64/syscall.S#l20

If it isn't too late, I'd recommend changing SECCOMP_RET_DATA mask
applied in SECCOMP_RET_ERRNO case from current 0xffff to 0xfff.


-- 
ldv

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help