Re: [RFC][PATCH RT 3/4] sched/rt: Use IPI to trigger RT task push migration... | linux-rt-users

Re: [RFC][PATCH RT 3/4] sched/rt: Use IPI to trigger RT task push migration instead of pulling

From: Steven Rostedt <rostedt@goodmis.org>
Date: 2012-12-11 14:03:02
Also in: lkml

On Tue, 2012-12-11 at 13:43 +0100, Thomas Gleixner wrote:

On Mon, 10 Dec 2012, Steven Rostedt wrote:

quoted

On Mon, 2012-12-10 at 17:15 -0800, Frank Rowand wrote:

quoted

I should have also mentioned some previous experience using IPIs to
avoid runq lock contention on wake up.  Someone encountered IPI
storms when using the TTWU_QUEUE feature, thus it defaults to off
for CONFIG_PREEMPT_RT_FULL:

  #ifndef CONFIG_PREEMPT_RT_FULL
  /*
   * Queue remote wakeups on the target CPU and process them
   * using the scheduler IPI. Reduces rq->lock contention/bounces.
   */
  SCHED_FEAT(TTWU_QUEUE, true)
  #else
  SCHED_FEAT(TTWU_QUEUE, false)

Interesting, but I'm wondering if this also does it for every wakeup? If
you have 1000 tasks waking up on another CPU, this could potentially
send out 1000 IPIs. The number of IPIs here looks to be # of tasks
waking up, and perhaps more than that, as there could be multiple
instances that try to wake up the same task.

Not using the TTWU_QUEUE feature limits the IPIs to a single one,
which is only sent if the newly woken task preempts the current task
on the remote cpu and the NEED_RESCHED flag was not yet set.
 
With TTWU_QUEUE you can induce massive latencies just by starting
hackbench. You get a herd wakeup on CPU0 which then enqueues hundreds
of tasks to the remote pull list and sends IPIs. The remote CPUs pulls
the tasks and activate them on their runqueue in hard interrupt
context. That easiliy can accumulate to hundreds of microseconds when
you do a mass push of newly woken tasks.

Of course it avoids fiddling with the remote rq lock, but it becomes
massivly non deterministic.

Agreed. I never suggested to use TTWU_QUEUE. I was just stating the
difference between that and my patches.

quoted

Now this patch set, the # of IPIs is limited to the # of CPUs. If you
have 4 CPUs, you'll get a storm of 3 IPIs. That's a big difference.

Yeah, the big difference is that you offload the double lock to the
IPI. So in the worst case you interrupt the most latency sensitive
task running on the remote CPU. Not sure if I really like that
"feature".

First, the pulled CPU isn't necessarily running the most latency
sensitive task. It just happens to be running more than one RT task, and
the waiting RT task can migrate. The running task may be of the same
priority as the waiting task. And they both may be the lowest priority
RT tasks in the system, and a CPU just went idle.

Currently, what we have is a huge contention on both the pulled CPU rq
lock. We've measured over 500us latencies due to it. This hurts even the
CPU that has the overloaded task, as the contention is on its lock.

-- Steve

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help