Thread (6 messages) 6 messages, 3 authors, 2014-09-03

Re: set_schedattr + cpuset issue

From: Vincent Legout <hidden>
Date: 2014-08-28 21:18:01
Subsystem: scheduler, the rest · Maintainers: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot, Linus Torvalds

Hello,

Juri Lelli [off-list ref] writes:
On Wed, 2 Jul 2014 17:08:47 -0400
Kevin Burns [off-list ref] wrote:
quoted
Here's the issue:

I am able to allocate a bandwidth with a ratio of .1 to two processes using
the sched_setattr() system call.

I then am able to add said tasks to a cpuset (with one physical processor)
using cset.

However, when I then try to update the runtime or period of either task,
sched_setattr returns a -EBUSY error.

Now, if I repeat the above experiment with just one task, I am able to
update the runtime or period without issue. I ran trace-cmd and kernelshark
to verify that the bandwidths were indeed being updated correctly. That and
htop was reporting a higher percentage of CPUusage, which correlated to the
ratios of my task's bandwidth.

Any ideas as to why cpuset would cause this behaviour?
Could you create a script that I can use to run your setup and reproduce
the problem?
Sorry for the delayed answer. I'm working with Kevin and the problem can
be reproduced using the attached files, also available here:

 http://legout.info/~vincent/sd/

On a Ubuntu 14.04 system running Linux 3.16, when running run.sh for the
2nd time, the 2nd call to sched_setattr() returns EBUSY. Uncommenting
line 41 of run.sh fixes this by returning to SCHED_OTHER before moving
the task to the cpuset.

The problem arises when using both cpusets and SCHED_DEADLINE. The
process described in section 5.1 of the SCHED_DEADLINE documentation
works fine if the process stays on the same cpuset, but I think their
are some issues when moving a process already in the SCHED_DEADLINE
policy from one cpuset to another.

According to our experiments, it seems that some fields are not updated
during this process, and it thus fails. When a task moves from one
cpuset to another, the total_bw fields of both cpusets doesn't seem to
be updated. Thus, in the next sched_setattr() call, __dl_overflow()
returns 1 because it thinks total_bw is 0 in the new cpuset. Then,
dl_overflow() returns -1 and we have a EBUSY error.

The total_bw field may also overflow because __dl_clear and __dl_add are
called while the task whose bandwidth is tsk_bw is not in the cpu
represented by dl_b.

We can get around this by moving the process back to another scheduling
policy before moving it to another cpuset. But we also had to apply the
following patch in order to be sure that the bandwith is always updated
(on top of v3.16). I'd think this condition has been added to skip all
the tests if the bandwith doesn't change. But there is an issue because
then, the total_bw field is not going to be updated for the new cpu. I'd
think the problem comes from the fact that p->dl.dl_bw is not updated
when a task leaves or returns the SCHED_DEADLINE policy.
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index bc1638b..0df3008 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2031,9 +2031,6 @@ static int dl_overflow(struct task_struct *p, int policy,
        u64 new_bw = dl_policy(policy) ? to_ratio(period, runtime) : 0;
        int cpus, err = -1;
 
-       if (new_bw == p->dl.dl_bw)
-               return 0;
-
        /*
         * Either if a task, enters, leave, or stays -deadline but changes
         * its parameters, we may need to update accordingly the total
I hope the above explanations make sense and I didn't miss anything
trivial. I'd be happy to provide more information or test anything if
needed.

Thanks,
Vincent

Attachments

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help