Re: [PATCH v17 08/15] seccomp: add system call filtering using BPF
From: Will Drewry <wad@chromium.org>
Date: 2012-04-10 20:15:58
Also in:
linux-arch, lkml
On Tue, Apr 10, 2012 at 2:54 PM, Andrew Morton [off-list ref] wrote:
On Mon, 9 Apr 2012 04:22:40 +1000 "Indan Zupancic" [off-list ref] wrote:quoted
On Sat, April 7, 2012 06:23, Andrew Morton wrote:quoted
hm, I'm surprised that we don't have a zero-returning implementation of is_compat_task() when CONFIG_COMPAT=n. Seems silly. Blames Arnd.It's sneakily hidden at the end of compat.h.quoted
quoted
+/** + * get_u32 - returns a u32 offset into data + * @data: a unsigned 64 bit value + * @index: 0 or 1 to return the first or second 32-bits + * + * This inline exists to hide the length of unsigned long. + * If a 32-bit unsigned long is passed in, it will be extended + * and the top 32-bits will be 0. If it is a 64-bit unsigned + * long, then whatever data is resident will be properly returned. + */ +static inline u32 get_u32(u64 data, int index) +{ + return ((u32 *)&data)[index]; +}This seems utterly broken on big-endian machines. If so: fix. If not: add comment explaining why?It's not a bug, it's intentional.Well it looks like a bug, which is why I suggest that it be clearly commented.
I've added a comment indicating it is intentionally ugly.
quoted
quoted
quoted
+ if (total_insns > MAX_INSNS_PER_PATH) + return -ENOMEM; + + /* + * Installing a seccomp filter requires that the task have + * CAP_SYS_ADMIN in its namespace or be running with no_new_privs. + * This avoids scenarios where unprivileged tasks can affect the + * behavior of privileged children. + */ + if (!current->no_new_privs && + security_capable_noaudit(current_cred(), current_user_ns(), + CAP_SYS_ADMIN) != 0) + return -EACCES; + + /* Allocate a new seccomp_filter */ + filter = kzalloc(sizeof(struct seccomp_filter) + fp_size, GFP_KERNEL);I think this gives userspace an easy way of causing page allocation failure warnings, by permitting large kmalloc() attempts. Add __GFP_NOWARN?Max is 32kb. sk_attach_filter() in net/core/filter.c is worse, it allocates up to 512kb before even checking the length.An order-3 allocation attempt is pretty fragile. This will sometimes fail.quoted
What about using GFP_USER (and adding __GFP_NOWARN to GFP_USER) instead?Let's be conventional and use the open-coded __GFP_NOWARN. __GFP_NOWARN says "this is a big allocation which will sometimes fail and I have carefully reviewed the failure paths and runtime tested them". Please carefully review the failure paths and runtime test them ;)
Thankfully the failure path is simple in this case. Additional runtime testing in progress :)
quoted
quoted
quoted
+ /* Check and rewrite the fprog via the skb checker */ + ret = sk_chk_filter(filter->insns, filter->len); + if (ret) + goto fail; + + /* Check and rewrite the fprog for seccomp use */ + ret = seccomp_chk_filter(filter->insns, filter->len);"check" is spelled "check"!Yes, it is and he did spell "check" as "Check". seccomp_chk_filter() mirrors sk_chk_filter(). So it refers to "chk", not "check".bah. Two poor identifiers isn't better than one. Whatever.
As per James's comment, reducing it to one poor identifier. thanks!