Thread (40 messages) 40 messages, 6 authors, 2014-08-04

TIF_NOHZ can escape nonhz mask? (Was: [PATCH v3 6/8] x86: Split syscall_trace_enter into two phases)

From: Frederic Weisbecker <hidden>
Date: 2014-07-31 00:30:46
Also in: linux-arch, linux-mips, lkml

On Wed, Jul 30, 2014 at 07:46:30PM +0200, Oleg Nesterov wrote:
On 07/30, Frederic Weisbecker wrote:
quoted
On Tue, Jul 29, 2014 at 07:54:14PM +0200, Oleg Nesterov wrote:
quoted
Looks like, we can kill context_tracking_task_switch() and simply change the
"__init" callers of context_tracking_cpu_set() to do set_thread_flag(TIF_NOHZ) ?
Then this flag will be propagated by copy_process().
Right, that would be much better. Good catch! context tracking is enabled from
tick_nohz_init(). This is the init 0 task so the flag should be propagated from there.
actually init 1 task, but this doesn't matter.
Are you sure? It does matter because that would invalidate everything I understood
about init/main.c :) I was convinced that the very first kernel init task is PID 0 then
it forks on rest_init() to launch the userspace init with PID 1. Then init/0 becomes the
idle task of the boot CPU.
quoted
I still think we need a for_each_process_thread() set as well though because some
kernel threads may well have been created at this stage already.
Yes... Or we can add set_thread_flag(TIF_NOHZ) into ____call_usermodehelper().
Couldn't there be some other tasks than usermodehelper stuffs at this stage? Like workqueues
or random kernel threads?
quoted
quoted
Or I am totally confused? (quite possible).
quoted
So here is a scenario where this is a problem: a task runs on CPU 0, passes the context
tracking call before returning from a syscall to userspace, and gets an interrupt. The
interrupt preempts the task and it moves to CPU 1. So it returns from preempt_schedule_irq()
after which it is going to resume to userspace.

In this scenario, if context tracking is only enabled on CPU 1, we have no way to know that
the task is resuming to userspace, because we passed through the context tracking probe
already and it was ignored on CPU 0.
Thanks. But I still can't understand... So if we only track CPU 1, then in this
case context_tracking.state == IN_USER on CPU 0, but it can be IN_USER or IN_KERNEL
on CPU 1.
I'm not sure I understand your question.
Probably because it was stupid. Seriously, I still have no idea what this code
actually does.
quoted
Context tracking is either enabled everywhere or
nowhere.

I need to say though that there is a per CPU context tracking state named context_tracking.active.
It's confusing because it suggests that context tracking is active per CPU. Actually it's tracked
everywhere when globally enabled, but active determines if we call the RCU and vtime callbacks or
not.

So only nohz full CPUs have context_tracking.active set because only these need to call the RCU
and vtime callbacks. Other CPUs still do the context tracking but they won't call rcu and vtime
functions.
I meant that in the scenario you described above the "global" TIF_NOHZ doesn't
really make a difference, afaics.

Lets assume that context tracking is only enabled on CPU 1. To simplify,
assume that we have a single usermode task T which sleeps in kernel mode.

So context_tracking[0].state == context_tracking[1].state == IN_KERNEL.

T wakes up on CPU_0, returns to user space, calls user_enter(). This sets
context_tracking[0].state = IN_USER but otherwise does nothing else, this
CPU is not tracked and .active is false.

Right after local_irq_restore() this task can migrate to CPU_1 and finish
its ret-to-usermode path. But since it had already passed user_enter() we
do not change context_tracking[1].state and do not play with rcu/vtime.
(unless this task hits SCHEDULE_USER in asm).

The same for user_exit() of course.
So indeed if context tracking is enabled on CPU 1 and not in CPU 0, we risk
such situation where CPU 1 has wrong context tracking.

But global TIF_NOHZ should enforce context tracking everywhere. And also it's
less context switch overhead.
Oleg.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help