Re: [RFC][PATCH v2 09/11] context_tracking,livepatch: Dont disturb NOHZ_FULL
From: Petr Mladek <pmladek@suse.com>
Date: 2021-10-06 10:29:36
Also in:
lkml
On Wed 2021-10-06 11:04:26, Peter Zijlstra wrote:
On Wed, Oct 06, 2021 at 10:12:17AM +0200, Petr Mladek wrote:quoted
IMHO, the original solution from v1 was better. We only needed toIt was also terribly broken in other 'fun' ways. See below.quoted
be careful when updating task->patch_state and clearing TIF_PATCH_PENDING to avoid the race. The following might work: static int klp_check_and_switch_task(struct task_struct *task, void *arg) { int ret; /* * Stack is reliable only when the task is not running on any CPU, * except for the task running this code. */ if (task_curr(task) && task != current) { /* * This only succeeds when the task is in NOHZ_FULL user * mode. Such a task might be migrated immediately. We * only need to be careful to set task->patch_state before * clearing TIF_PATCH_PENDING so that the task migrates * itself when entring kernel in the meatime. */ if (is_ct_user(task)) { klp_update_patch_state(task); return 0; } return -EBUSY; } ret = klp_check_stack(task, arg); if (ret) return ret; /* * The task neither is running on any CPU and nor it can get * running. As a result, the ordering is not important and * barrier is not needed. */ task->patch_state = klp_target_state; clear_tsk_thread_flag(task, TIF_PATCH_PENDING); return 0; } , where is_ct_user(task) would return true when task is running in CONTEXT_USER. If I get the context_tracking API correctly then it might be implemeted the following way:That's not sufficient, you need to tag the remote task with a ct_work item to also runs klp_update_patch_state(), otherwise the remote CPU can enter kernel space between checking is_ct_user() and doing klp_update_patch_state(): CPU0 CPU1 <user> if (is_ct_user()) // true <kernel-entry> // run some kernel code klp_update_patch_state() *WHOOPSIE* So it needs to be something like: CPU0 CPU1 <user> if (context_tracking_set_cpu_work(task_cpu(), CT_WORK_KLP)) <kernel-entry> klp_update_patch_state klp_update_patch_state() So that CPU0 and CPU1 race to complete klp_update_patch_state() *before* any regular (!noinstr) code gets run.
Grr, you are right. I thought that we migrated the task when entering kernel even before. But it seems that we do it only when leaving the kernel in exit_to_user_mode_loop().
Which then means it needs to look something like:
noinstr void klp_update_patch_state(struct task_struct *task)
{
struct thread_info *ti = task_thread_info(task);
preempt_disable_notrace();
if (arch_test_bit(TIF_PATCH_PENDING, (unsigned long *)&ti->flags)) {
/*
* Order loads of TIF_PATCH_PENDING vs klp_target_state.
* See klp_init_transition().
*/
smp_rmb();
task->patch_state = __READ_ONCE(klp_target_state);
/*
* Concurrent against self; must observe updated
* task->patch_state if !TIF_PATCH_PENDING.
*/
smp_mb__before_atomic();IMHO, smp_wmb() should be enough. We are here only when this CPU set task->patch_state right above. So that CPU running this code should see the correct task->patch_state. The read barrier is needed only when @task is entering kernel and does not see TIF_PATCH_PENDING. It is handled by smp_rmb() in the "else" branch below. It is possible that both CPUs see TIF_PATCH_PENDING and both set task->patch_state. But it should not cause any harm because they set the same value. Unless something really crazy happens with the internal CPU busses and caches.
arch_clear_bit(TIF_PATCH_PENDING, (unsigned long *)&ti->flags);
} else {
/*
* Concurrent against self, see smp_mb__before_atomic()
* above.
*/
smp_rmb();Yeah, this is the counter part against the above smp_wmb().
} preempt_enable_notrace(); }
Now, I am scared to increase my paranoia level and search for even more possible races. I feel overwhelmed at the moment ;-) Best Regards, Petr