Re: On migrate_disable() and latencies

From: Nicholas Mc Guire <hidden>
Date: 2011-07-28 07:54:48
Also in: lkml

On Wed, 27 Jul 2011, Paul E. McKenney wrote:

On Wed, Jul 27, 2011 at 01:13:18PM +0200, Peter Zijlstra wrote:

quoted

On Mon, 2011-07-25 at 14:17 -0700, Paul E. McKenney wrote:

quoted

I suppose it is indeed. Even for the SoftRT case we need to make sure
the total utilization loss is indeed acceptable.

OK.  If you are doing strict priority, then everything below the highest
priority is workload dependent.

<snip throttling, that's a whole different thing>

quoted

 The higher-priority
tasks can absolutely starve the lower-priority ones, with or without
the migrate-disable capability.

Sure, that's how FIFO works, but it also relies on the fact that once
your high priority task completes the lower priority task resumes.

The extension to SMP is that we run the m highest priority tasks on n
cpus ; where m <= n. Any loss in utilization (idle time in this
particular case, but irq/preemption/migration and cache overhead are
also time not spend on the actual workload.

Now the WCET folks are all about quantifying the needs of applications
and the utilization limits of the OS etc. And while for SoftRT you can
relax quite a few of the various bounds you still need to know them in
order relax them (der Hofrat likes to move from worst case to avg case
IIRC).

its not about worst case vs. average case its about using the distribution
rather than boundary values - boundary values are hard to correlate with
specific events.

;-)

quoted

Another way of looking at it is from the viewpoint of the additional
priority-boost events.  If preemption is disabled, the low-priority task
will execute through the preempt-disable region without context switching.
In contrast, given a migration-disable region, the low-priority task
might be preempted and then boosted.  (If I understand correctly, if some
higher-priority task tries to enter the same type of migration-disable
region, it will acquire the associated lock, thus priority-boosting the
task that is already in that region.)

No, there is no boosting involved, migrate_disable() isn't intrinsically
tied to a lock or other PI construct. We might needs locks to keep some
of the per-cpu crap correct, but that again, is a whole different ball
game.

But even if it was, I don't think PI will help any for this, we still
need to complete the various migrate_disable() sections, see below.

OK, got it.  I think, anyway.  I was incorrectly (or at least unhelpfully)
pulling in locks that might be needed to handle per-CPU variables.

quoted

One stupid-but-tractable way to model this is to have an interarrival
rate for the various process priorities, and then calculate the odds of
(1) a higher priority process arriving while the low-priority one is
in a *-disable region and (2) that higher priority process needing to
enter a conflicting *-disable region.  This would give you some measure
of the added boosting load due to migration-disable as compared to
preemption-disable.

Would this sort of result be useful?

Yes, such type of analysis can be used, and I guess we can measure
various variables related to that.

OK, good.

quoted

My main worry with all this is that we have these insane long !preempt
regions in mainline that are now !migrate regions, and thus per all the
above we could be looking at a substantial utilization loss.

Alternatively we could all be missing something far more horrid, but
that might just be my paranoia talking.

Ah, good point -- if each migration-disable region is associated with
a lock, then you -could- allow migration and gain better utilization
at the expense of worse caching behavior.  Is that the concern?

I'm not seeing how that would be true, suppose you have this stack of 4
migrate_disable() sections and 3 idle cpus, no amount of boosting will
make the already running task at the top of the stack go any faster, and
it needs to complete the migrate_disable section before it can be
migrated, equally so for the rest, so you still need
3*migrate-disable-period of time before all your cpus are busy again.

You can move another task to the top of the stack by boosting, but
you'll need 3 tasks to complete their resp migrate-disable section, it
doesn't matter which task, so boosting doesn't change anything.

OK, so let me see if I understand what you are looking to model.

o	There are no locks.

o	There are a finite number of tasks with varying priorities.
	(I would initially work with a single task per priority
	level, but IIRC it is not hard to make multiple tasks per
	priority work.  Not a fan of infinite numbers of priorities,
	though!)

o	There are multiple CPUs.

o	Once a task enters a migrate-disable region, it must remain
	on that CPU.  (I will initially model the migrate-disable region
	as consuming a fixed amount of CPU.  If I wanted to really wuss
	out, I would model it as consuming an exponentially distributed
	amount of CPU.)

o	Tasks awakening outside of migrate-disable regions will pick
	the CPU running the lowest-priority task, whether or not this
	task is in migrate-disable state.  (At least I don't see
	anything in 3.0-rt3 that looks like a scheduling decision
	based on ->migrate_disable, perhaps due to blindness.)

This might be a simple heuristics to minimize the probability of stacking
in the first place.

o	For an example, if all CPUs except for one are running prio-99
	tasks, and the remaining CPU is running a prio-1 task in
	a migrate-disable region, if a prio-2 tasks awakens, it
	will preempt the prio-1 task.

all CPUs utilized so no utilization loss at all in that szenario

	On the other hand, if at least one of the CPUs was idle,
	the prio-2 task would have instead run on that idle CPU.

so what you need to add to the model is the probability of the transitional
event:

   * prio-2 task preempts prio-1 task because all CPUs are idle
   * atleast one CPU becomes idle while prio-1 task is blocked for migration
     due to migrate-disable + preemted by prio-2 task

only in this combination does the system suffer a utilization penalty.

o	The transition probabilities depend on the priority
	of the currently running migrate-disable CPU -- the higher
	that priority, the greater the chance that any preempting
	task will find some other CPU instead.

	The recurrence times depend on the number of tasks stacked
	up in migrate-disable regions on that CPU.

If this all holds, it would be possible to compute the probability
of a given migrate-disable region being preempted and if preempted,
the expected duration of that preemption, given the following
quantities as input:

o	The probability that a given CPU is running a task
	of priority P for each priority.  The usual way to
	estimate this is based on per-thread CPU utilizations.

o	The expected duration of migrate-disable regions.

o	The expected wakeups per second for tasks of each priority.

With the usual disclaimers about cheezy mathematical approximations
of reality and all that.

Would this be useful, or am I still missing the point?

to get an estimation of the latency impact - but to get a estimate of the
impact on system utilization you would need to include the probability that a 
different CPU is idle in the system and would in principle allow running
one of the tasks that can'b be migrated. As I understood it, the initial 
questions was if migrate_disable has a relevant impact on system utilization
in multicore systems. For this question I guess two of the key parameters are

 * probability that migrate-disable stacking occures 
 * probability of a idle CPU transition while stacking persists

I guess the probability of an idle transition of a CPU is hard to model as it
is very profile specific.

hofrat

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help