Re: [PATCH 11/11] seccomp: Add tgid and tid into seccomp_data
From: David Drysdale <hidden>
Date: 2014-07-27 12:09:24
Also in:
lkml
On Fri, Jul 25, 2014 at 6:18 PM, Andy Lutomirski [off-list ref] wrote:
[cc: Eric Biederman] On Fri, Jul 25, 2014 at 10:10 AM, Kees Cook [off-list ref] wrote:quoted
On Fri, Jul 25, 2014 at 8:59 AM, Andy Lutomirski [off-list ref] wrote:quoted
On Jul 25, 2014 6:48 AM, "David Drysdale" [off-list ref] wrote:quoted
Add the current thread and thread group IDs into the data available for seccomp-bpf programs to work on. This allows installation of filters that police syscalls based on thread or process ID, e.g. tgkill(2)/kill(2)/prctl(2). Signed-off-by: David Drysdale <redacted> --- include/uapi/linux/seccomp.h | 10 ++++++++++ kernel/seccomp.c | 2 ++ 2 files changed, 12 insertions(+)diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h index ac2dc9f72973..b88370d6f6ca 100644 --- a/include/uapi/linux/seccomp.h +++ b/include/uapi/linux/seccomp.h@@ -36,12 +36,22 @@ * @instruction_pointer: at the time of the system call. * @args: up to 6 system call arguments always stored as 64-bit values * regardless of the architecture. + * @tgid: thread group ID of the thread executing the BPF program. + * @tid: thread ID of the thread executing the BPF program. + * The SECCOMP_DATA_TID_PRESENT macro indicates the presence of the + * tgid and tid fields; user programs may use this macro to conditionally + * compile code against older versions of the kernel. Note also that + * BPF programs should cope with the absence of these fields by testing + * the length of data available. */ struct seccomp_data { int nr; __u32 arch; __u64 instruction_pointer; __u64 args[6]; + __u32 tgid; + __u32 tid; }; +#define SECCOMP_DATA_TID_PRESENT 1 #endif /* _UAPI_LINUX_SECCOMP_H */diff --git a/kernel/seccomp.c b/kernel/seccomp.c index 301bbc24739c..dd5146f15d6d 100644 --- a/kernel/seccomp.c +++ b/kernel/seccomp.c@@ -80,6 +80,8 @@ static void populate_seccomp_data(struct seccomp_data *sd) sd->args[4] = args[4]; sd->args[5] = args[5]; sd->instruction_pointer = KSTK_EIP(task); + sd->tgid = task_tgid_vnr(current); + sd->tid = task_pid_vnr(current); }This is, IMO, problematic. These should probably be relative to the filter creator, not the filtered task. This will also hurt performance.Yeah, we can't change the seccomp_data structure without a lot of care, and tgid/tid really should be encoded in the filter. However, it is tricky in the forking case.quoted
What's the use case? Can it be better achieved with a new eBPF function?
The specific use case is to be able to write a filter that allows kill(2) or tgkill(2) to self, where the filter still works after forking. Capsicum capability mode in general locks down system calls that access PIDs (as they're a global namespace), but allows kill(self) as a pragmatic compromise to make it easier to migrate applications to use Capsicum.
quoted
Julien had been wanting something like this too (though he'd suggested it via prctl): limit the signal functions to "self" only. I wonder if adding a prctl like done for O_BENEATH could work for signal sending?Can we do one better and add a flag to prevent any non-self pid lookups? This might actually be easy on top of the pid namespace work (e.g. we could change the way that find_task_by_vpid works).
That sounds like a good idea, as long as it's possible for non-CAP_SYS_ADMIN processes to do....
It's far from just being signals. There's access_process_vm, ptrace, all the signal functions, clock_gettime (see CPUCLOCK_PID -- yes, this is ridiculous), and probably some others that I've forgotten about or never noticed in the first place.
For the Capsicum case in particular, most of these are restricted by the capability mode filter anyhow (although I need to fix it for CPUCLOCK_PID -- thanks for pointing that out); the kill(2) case was a special case to make migrations easier. But a more general mechanism seems sensible.
--Andyquoted
-Kees -- Kees Cook Chrome OS Security-- Andy Lutomirski AMA Capital Management, LLC