Thread (15 messages) 15 messages, 3 authors, 2021-08-12

Re: [PATCH v1 3/7] kernel/fork: always deny write access to current MM exe_file

From: Christian Brauner <hidden>
Date: 2021-08-12 12:32:58
Also in: linux-api, linux-fsdevel, linux-unionfs, lkml

On Thu, Aug 12, 2021 at 12:13:44PM +0200, David Hildenbrand wrote:
On 12.08.21 12:05, Christian Brauner wrote:
quoted
[+Cc Andrei]

On Thu, Aug 12, 2021 at 10:43:44AM +0200, David Hildenbrand wrote:
quoted
We want to remove VM_DENYWRITE only currently only used when mapping the
executable during exec. During exec, we already deny_write_access() the
executable, however, after exec completes the VMAs mapped
with VM_DENYWRITE effectively keeps write access denied via
deny_write_access().

Let's deny write access when setting the MM exe_file. With this change, we
can remove VM_DENYWRITE for mapping executables.

This represents a minor user space visible change:
sys_prctl(PR_SET_MM_EXE_FILE) can now fail if the file is already
opened writable. Also, after sys_prctl(PR_SET_MM_EXE_FILE), the file
Just for completeness, this also affects PR_SET_MM_MAP when exe_fd is
set.
Correct.
quoted
quoted
cannot be opened writable. Note that we can already fail with -EACCES if
the file doesn't have execute permissions.

Signed-off-by: David Hildenbrand <redacted>
---
The biggest user I know and that I'm involved in is CRIU which heavily
uses PR_SET_MM_MAP (with a fallback to PR_SET_MM_EXE_FILE on older
kernels) during restore. Afair, criu opens the exe fd as an O_PATH
during dump and thus will use the same flag during restore when
opening it. So that should be fine.
Yes.
quoted
However, if I understand the consequences of this change correctly, a
problem could be restoring workloads that hold a writable fd open to
their exe file at dump time which would mean that during restore that fd
would be reopened writable causing CRIU to fail when setting the exe
file for the task to be restored.
If it's their exe file, then the existing VM_DENYWRITE handling would have
forbidden these workloads to open the fd of their exe file writable, right?
Yes.
At least before doing any PR_SET_MM_MAP/PR_SET_MM_EXE_FILE. But that should
rule out quite a lot of cases we might be worried about, right?
Yes, it rules out the most obvious cases. The problem is really just
that we don't know how common weirder cases are. But that doesn't mean
we shouldn't try and risk it. This is a nice cleanup and playing
/proc/self/exe games isn't super common.
quoted
Which honestly, no idea how many such workloads exist. (I know at least
of runC and LXC need to sometimes reopen to rexec themselves (weird bug
to protect against attacking the exe file) and thus re-open
/proc/self/exe but read-only.)
quoted
  kernel/fork.c | 39 ++++++++++++++++++++++++++++++++++-----
  1 file changed, 34 insertions(+), 5 deletions(-)
diff --git a/kernel/fork.c b/kernel/fork.c
index 6bd2e52bcdfb..5d904878f19b 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -476,6 +476,7 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
  {
  	struct vm_area_struct *mpnt, *tmp, *prev, **pprev;
  	struct rb_node **rb_link, *rb_parent;
+	struct file *exe_file;
  	int retval;
  	unsigned long charge;
  	LIST_HEAD(uf);
@@ -493,7 +494,10 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
  	mmap_write_lock_nested(mm, SINGLE_DEPTH_NESTING);
  	/* No ordering required: file already has been exposed. */
-	RCU_INIT_POINTER(mm->exe_file, get_mm_exe_file(oldmm));
+	exe_file = get_mm_exe_file(oldmm);
+	RCU_INIT_POINTER(mm->exe_file, exe_file);
+	if (exe_file)
+		deny_write_access(exe_file);
  	mm->total_vm = oldmm->total_vm;
  	mm->data_vm = oldmm->data_vm;
@@ -638,8 +642,13 @@ static inline void mm_free_pgd(struct mm_struct *mm)
  #else
  static int dup_mmap(struct mm_struct *mm, struct mm_struct *oldmm)
  {
+	struct file *exe_file;
+
  	mmap_write_lock(oldmm);
-	RCU_INIT_POINTER(mm->exe_file, get_mm_exe_file(oldmm));
+	exe_file = get_mm_exe_file(oldmm);
+	RCU_INIT_POINTER(mm->exe_file, exe_file);
+	if (exe_file)
+		deny_write_access(exe_file);
  	mmap_write_unlock(oldmm);
  	return 0;
  }
@@ -1163,11 +1172,19 @@ void set_mm_exe_file(struct mm_struct *mm, struct file *new_exe_file)
  	 */
  	old_exe_file = rcu_dereference_raw(mm->exe_file);
-	if (new_exe_file)
+	if (new_exe_file) {
  		get_file(new_exe_file);
+		/*
+		 * exec code is required to deny_write_access() successfully,
+		 * so this cannot fail
+		 */
+		deny_write_access(new_exe_file);
+	}
  	rcu_assign_pointer(mm->exe_file, new_exe_file);
-	if (old_exe_file)
+	if (old_exe_file) {
+		allow_write_access(old_exe_file);
  		fput(old_exe_file);
+	}
  }
  int atomic_set_mm_exe_file(struct mm_struct *mm, struct file *new_exe_file)
@@ -1194,10 +1211,22 @@ int atomic_set_mm_exe_file(struct mm_struct *mm, struct file *new_exe_file)
  	}
  	/* set the new file, lockless */
+	ret = deny_write_access(new_exe_file);
+	if (ret)
+		return -EACCES;
  	get_file(new_exe_file);
+
  	old_exe_file = xchg(&mm->exe_file, new_exe_file);
-	if (old_exe_file)
+	if (old_exe_file) {
+		/*
+		 * Don't race with dup_mmap() getting the file and disallowing
+		 * write access while someone might open the file writable.
+		 */
+		mmap_read_lock(mm);
+		allow_write_access(old_exe_file);
  		fput(old_exe_file);
+		mmap_read_unlock(mm);
+	}
  	return 0;
  }
-- 
2.31.1

-- 
Thanks,

David / dhildenb
  
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help