Thread (3 messages) 3 messages, 2 authors, 2024-01-03

Re: [PATCH] kernel: Introduce a write lock/unlock wrapper for tasklist_lock

From: "Jarkko Sakkinen" <jarkko@kernel.org>
Date: 2024-01-03 14:04:59
Also in: keyrings, linux-arm-msm, linux-fsdevel, linux-mm, lkml

On Mon Dec 25, 2023 at 10:19 AM EET, Maria Yu wrote:
quoted hunk ↗ jump to hunk
As a rwlock for tasklist_lock, there are multiple scenarios to acquire
read lock which write lock needed to be waiting for.
In freeze_process/thaw_processes it can take about 200+ms for holding read
lock of tasklist_lock by walking and freezing/thawing tasks in commercial
devices. And write_lock_irq will have preempt disabled and local irq
disabled to spin until the tasklist_lock can be acquired. This leading to
a bad responsive performance of current system.
Take an example:
1. cpu0 is holding read lock of tasklist_lock to thaw_processes.
2. cpu1 is waiting write lock of tasklist_lock to exec a new thread with
   preempt_disabled and local irq disabled.
3. cpu2 is waiting write lock of tasklist_lock to do_exit with
   preempt_disabled and local irq disabled.
4. cpu3 is waiting write lock of tasklist_lock to do_exit with
   preempt_disabled and local irq disabled.
So introduce a write lock/unlock wrapper for tasklist_lock specificly.
The current taskslist_lock writers all have write_lock_irq to hold
tasklist_lock, and write_unlock_irq to release tasklist_lock, that means
the writers are not suitable or workable to wait on tasklist_lock in irq
disabled scenarios. So the write lock/unlock wrapper here only follow the
current design of directly use local_irq_disable and local_irq_enable,
and not take already irq disabled writer callers into account.
Use write_trylock in the loop and enabled irq for cpu to repsond if lock
cannot be taken.

Signed-off-by: Maria Yu <redacted>
---
 fs/exec.c                  | 10 +++++-----
 include/linux/sched/task.h | 29 +++++++++++++++++++++++++++++
 kernel/exit.c              | 16 ++++++++--------
 kernel/fork.c              |  6 +++---
 kernel/ptrace.c            | 12 ++++++------
 kernel/sys.c               |  8 ++++----
 security/keys/keyctl.c     |  4 ++--
 7 files changed, 57 insertions(+), 28 deletions(-)
diff --git a/fs/exec.c b/fs/exec.c
index 4aa19b24f281..030eef6852eb 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1086,7 +1086,7 @@ static int de_thread(struct task_struct *tsk)
 
 		for (;;) {
 			cgroup_threadgroup_change_begin(tsk);
-			write_lock_irq(&tasklist_lock);
+			write_lock_tasklist_lock();
 			/*
 			 * Do this under tasklist_lock to ensure that
 			 * exit_notify() can't miss ->group_exec_task
@@ -1095,7 +1095,7 @@ static int de_thread(struct task_struct *tsk)
 			if (likely(leader->exit_state))
 				break;
 			__set_current_state(TASK_KILLABLE);
-			write_unlock_irq(&tasklist_lock);
+			write_unlock_tasklist_lock();
 			cgroup_threadgroup_change_end(tsk);
 			schedule();
 			if (__fatal_signal_pending(tsk))
@@ -1150,7 +1150,7 @@ static int de_thread(struct task_struct *tsk)
 		 */
 		if (unlikely(leader->ptrace))
 			__wake_up_parent(leader, leader->parent);
-		write_unlock_irq(&tasklist_lock);
+		write_unlock_tasklist_lock();
 		cgroup_threadgroup_change_end(tsk);
 
 		release_task(leader);
@@ -1198,13 +1198,13 @@ static int unshare_sighand(struct task_struct *me)
 
 		refcount_set(&newsighand->count, 1);
 
-		write_lock_irq(&tasklist_lock);
+		write_lock_tasklist_lock();
 		spin_lock(&oldsighand->siglock);
 		memcpy(newsighand->action, oldsighand->action,
 		       sizeof(newsighand->action));
 		rcu_assign_pointer(me->sighand, newsighand);
 		spin_unlock(&oldsighand->siglock);
-		write_unlock_irq(&tasklist_lock);
+		write_unlock_tasklist_lock();
 
 		__cleanup_sighand(oldsighand);
 	}
diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
index a23af225c898..6f69d9a3c868 100644
--- a/include/linux/sched/task.h
+++ b/include/linux/sched/task.h
@@ -50,6 +50,35 @@ struct kernel_clone_args {
  * a separate lock).
  */
 extern rwlock_t tasklist_lock;
+
+/*
+ * Tasklist_lock is a special lock, it takes a good amount of time of
+ * taskslist_lock readers to finish, and the pure write_irq_lock api
+ * will do local_irq_disable at the very first, and put the current cpu
+ * waiting for the lock while is non-responsive for interrupts.
+ *
+ * The current taskslist_lock writers all have write_lock_irq to hold
+ * tasklist_lock, and write_unlock_irq to release tasklist_lock, that
+ * means the writers are not suitable or workable to wait on
+ * tasklist_lock in irq disabled scenarios. So the write lock/unlock
+ * wrapper here only follow the current design of directly use
+ * local_irq_disable and local_irq_enable.
+ */
+static inline void write_lock_tasklist_lock(void)
+{
+	while (1) {
+		local_irq_disable();
+		if (write_trylock(&tasklist_lock))
+			break;
+		local_irq_enable();
+		cpu_relax();
+	}
Maybe:

	local_irq_disable();
	while (!write_trylock(&tasklist_lock)) {
		local_irq_enable();
		cpu_relax();
		local_irq_disable();
	}

BR, Jarkko
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help