Re: set_schedattr + cpuset issue
From: Vincent Legout <hidden>
Date: 2014-08-28 21:18:01
Subsystem:
scheduler, the rest · Maintainers:
Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot, Linus Torvalds
From: Vincent Legout <hidden>
Date: 2014-08-28 21:18:01
Subsystem:
scheduler, the rest · Maintainers:
Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot, Linus Torvalds
Hello, Juri Lelli [off-list ref] writes:
On Wed, 2 Jul 2014 17:08:47 -0400 Kevin Burns [off-list ref] wrote:quoted
Here's the issue: I am able to allocate a bandwidth with a ratio of .1 to two processes using the sched_setattr() system call. I then am able to add said tasks to a cpuset (with one physical processor) using cset. However, when I then try to update the runtime or period of either task, sched_setattr returns a -EBUSY error. Now, if I repeat the above experiment with just one task, I am able to update the runtime or period without issue. I ran trace-cmd and kernelshark to verify that the bandwidths were indeed being updated correctly. That and htop was reporting a higher percentage of CPUusage, which correlated to the ratios of my task's bandwidth. Any ideas as to why cpuset would cause this behaviour?Could you create a script that I can use to run your setup and reproduce the problem?
Sorry for the delayed answer. I'm working with Kevin and the problem can be reproduced using the attached files, also available here: http://legout.info/~vincent/sd/ On a Ubuntu 14.04 system running Linux 3.16, when running run.sh for the 2nd time, the 2nd call to sched_setattr() returns EBUSY. Uncommenting line 41 of run.sh fixes this by returning to SCHED_OTHER before moving the task to the cpuset. The problem arises when using both cpusets and SCHED_DEADLINE. The process described in section 5.1 of the SCHED_DEADLINE documentation works fine if the process stays on the same cpuset, but I think their are some issues when moving a process already in the SCHED_DEADLINE policy from one cpuset to another. According to our experiments, it seems that some fields are not updated during this process, and it thus fails. When a task moves from one cpuset to another, the total_bw fields of both cpusets doesn't seem to be updated. Thus, in the next sched_setattr() call, __dl_overflow() returns 1 because it thinks total_bw is 0 in the new cpuset. Then, dl_overflow() returns -1 and we have a EBUSY error. The total_bw field may also overflow because __dl_clear and __dl_add are called while the task whose bandwidth is tsk_bw is not in the cpu represented by dl_b. We can get around this by moving the process back to another scheduling policy before moving it to another cpuset. But we also had to apply the following patch in order to be sure that the bandwith is always updated (on top of v3.16). I'd think this condition has been added to skip all the tests if the bandwith doesn't change. But there is an issue because then, the total_bw field is not going to be updated for the new cpu. I'd think the problem comes from the fact that p->dl.dl_bw is not updated when a task leaves or returns the SCHED_DEADLINE policy.
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index bc1638b..0df3008 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c@@ -2031,9 +2031,6 @@ static int dl_overflow(struct task_struct *p, int policy, u64 new_bw = dl_policy(policy) ? to_ratio(period, runtime) : 0; int cpus, err = -1; - if (new_bw == p->dl.dl_bw) - return 0; - /* * Either if a task, enters, leave, or stays -deadline but changes * its parameters, we may need to update accordingly the total
I hope the above explanations make sense and I didn't miss anything trivial. I'd be happy to provide more information or test anything if needed. Thanks, Vincent