Re: [PATCH 1/2] fs/exec: Explicitly unshare fs_struct on exec
From: Jann Horn <jannh@google.com>
Date: 2022-10-06 15:36:19
Also in:
linux-fsdevel, linux-hardening, linux-mm, lkml, selinux
On Thu, Oct 6, 2022 at 5:25 PM Kees Cook [off-list ref] wrote:
On October 6, 2022 7:13:37 AM PDT, Jann Horn [off-list ref] wrote:quoted
On Thu, Oct 6, 2022 at 11:05 AM Christian Brauner [off-list ref] wrote:quoted
On Thu, Oct 06, 2022 at 01:27:34AM -0700, Kees Cook wrote:quoted
The check_unsafe_exec() counting of n_fs would not add up under a heavily threaded process trying to perform a suid exec, causing the suid portion to fail. This counting error appears to be unneeded, but to catch any possible conditions, explicitly unshare fs_struct on exec, if it ends upIsn't this a potential uapi break? Afaict, before this change a call to clone{3}(CLONE_FS) followed by an exec in the child would have the parent and child share fs information. So if the child e.g., changes the working directory post exec it would also affect the parent. But after this change here this would no longer be true. So a child changing a workding directoro would not affect the parent anymore. IOW, an exec is accompanied by an unshare(CLONE_FS). Might still be worth trying ofc but it seems like a non-trivial uapi change but there might be few users that do clone{3}(CLONE_FS) followed by an exec.I believe the following code in Chromium explicitly relies on this behavior, but I'm not sure whether this code is in active use anymore: https://source.chromium.org/chromium/chromium/src/+/main:sandbox/linux/suid/sandbox.c;l=101?q=CLONE_FS&sq=&ss=chromiumOh yes. I think I had tried to forget this existed. Ugh. Okay, so back to the drawing board, I guess. The counting will need to be fixed... It's possible we can move the counting after dethread -- it seems the early count was just to avoid setting flags after the point of no return, but it's not an error condition...
Random idea that I haven't thought about a lot: One approach might be to not do it by counting, but instead have a flag on the fs_struct that we set when someone does a clone() with CLONE_FS but without CLONE_THREAD? Then we'd end up with the following possible states for fs_struct: - single-process, normal - single-process, pending execve past check_unsafe_exec() (prevent concurrent CLONE_FS) - shared between processes The slight difference from the old semantics would be that once you've used CLONE_FS without CLONE_THREAD, you can never do setuid execve() from your current process again (without calling unshare()), even if the child disappears in the meantime. I think that might be an acceptably small UAPI break.