Thread (59 messages) 59 messages, 7 authors, 2016-02-11

Re: [RFCv6 PATCH 03/10] sched: scheduler-driven cpu frequency selection

From: Juri Lelli <hidden>
Date: 2015-12-15 10:31:37
Also in: lkml

On 14/12/15 18:02, Steve Muckle wrote:
Hi Juri,

Thanks for the review.

On 12/11/2015 03:04 AM, Juri Lelli wrote:
quoted
quoted
+config CPU_FREQ_GOV_SCHED
+	bool "'sched' cpufreq governor"
+	depends on CPU_FREQ
We depend on IRQ_WORK as well, which in turn I think depends on SMP. As
briefly discussed with Peter on IRC, we might want to use
smp_call_function_single_async() instead to break this dependecies
chain (and be able to use this governor on UP as well).
FWIW I don't see an explicit dependency of IRQ_WORK on SMP
Oh, right. I seemed to remember that, but now I couldn't find this
dependency anymore.
(init/Kconfig), nevertheless I'll take a look at moving to
smp_call_function_single_async() to reduce the dependency list of
sched-freq.
OK, great. I think there's still value in reducing the dependency list.
...
quoted
quoted
+	/* avoid race with cpufreq_sched_stop */
+	if (!down_write_trylock(&policy->rwsem))
+		return;
+
+	__cpufreq_driver_target(policy, freq, CPUFREQ_RELATION_L);
+
+	gd->throttle = ktime_add_ns(ktime_get(), gd->throttle_nsec);
As I think you proposed at Connect, we could use post frequency
transition notifiers to implement throttling. Is this something that you
already tried implementing/planning to experiment with?
I started to do this a while back and then decided to hold off. I think
(though I can't recall for sure) it may have been so I could
artificially throttle the rate of frequency change events further by
specifying an inflated frequency change time. That's useful to have as
we experiment with policy.

We probably want both of these mechanisms. Throttling at a minimum based
on transition end notifiers, and the option of throttling further for
policy purposes (at least for now, or as a debug option). Will look at
this again.
Yeah, looks good.
...
quoted
quoted
+static int cpufreq_sched_thread(void *data)
+{
+	struct sched_param param;
+	struct cpufreq_policy *policy;
+	struct gov_data *gd;
+	unsigned int new_request = 0;
+	unsigned int last_request = 0;
+	int ret;
+
+	policy = (struct cpufreq_policy *) data;
+	gd = policy->governor_data;
+
+	param.sched_priority = 50;
+	ret = sched_setscheduler_nocheck(gd->task, SCHED_FIFO, &param);
+	if (ret) {
+		pr_warn("%s: failed to set SCHED_FIFO\n", __func__);
+		do_exit(-EINVAL);
+	} else {
+		pr_debug("%s: kthread (%d) set to SCHED_FIFO\n",
+				__func__, gd->task->pid);
+	}
+
+	do {
+		set_current_state(TASK_INTERRUPTIBLE);
+		new_request = gd->requested_freq;
+		if (new_request == last_request) {
+			schedule();
+		} else {
Shouldn't we have to do the following here?

@@ -125,9 +125,9 @@ static int cpufreq_sched_thread(void *data)
 	}
 
 	do {
-		set_current_state(TASK_INTERRUPTIBLE);
 		new_request = gd->requested_freq;
 		if (new_request == last_request) {
+			set_current_state(TASK_INTERRUPTIBLE);
 			schedule();
 		} else {
 			/*
Otherwise we set task to INTERRUPTIBLE state right after it has been
woken up.
The state must be set to TASK_INTERRUPTIBLE before the data used to
decide whether to sleep or not is read (gd->requested_freq in this case).

If it is set after, then once gd->requested_freq is read but before the
state is set to TASK_INTERRUPTIBLE, the other side may update
gd->requested_freq and issue a wakeup on the freq thread. The wakeup
will have no effect since the freq thread would still be TASK_RUNNING at
that time. The freq thread would proceed to go to sleep and the update
would be lost.
Mmm, I suggested that because I was hitting this while testing:

[   34.816158] ------------[ cut here ]------------
[   34.816177] WARNING: CPU: 2 PID: 1712 at kernel/kernel/sched/core.c:7617 __might_sleep+0x90/0xa8()
[   34.816188] do not call blocking ops when !TASK_RUNNING; state=1 set at [<c007c1f8>] cpufreq_sched_thread+0x80/0x2b0
[   34.816198] Modules linked in:
[   34.816207] CPU: 2 PID: 1712 Comm: kschedfreq:1 Not tainted 4.4.0-rc2+ #401
[   34.816212] Hardware name: ARM-Versatile Express
[   34.816229] [<c0018874>] (unwind_backtrace) from [<c0013f60>] (show_stack+0x20/0x24)
[   34.816243] [<c0013f60>] (show_stack) from [<c0448c98>] (dump_stack+0x80/0xb4)
[   34.816257] [<c0448c98>] (dump_stack) from [<c0029930>] (warn_slowpath_common+0x88/0xc0)
[   34.816267] [<c0029930>] (warn_slowpath_common) from [<c0029a24>] (warn_slowpath_fmt+0x40/0x48)
[   34.816278] [<c0029a24>] (warn_slowpath_fmt) from [<c0054764>] (__might_sleep+0x90/0xa8)
[   34.816291] [<c0054764>] (__might_sleep) from [<c0578400>] (cpufreq_freq_transition_begin+0x6c/0x13c)
[   34.816303] [<c0578400>] (cpufreq_freq_transition_begin) from [<c0578714>] (__cpufreq_driver_target+0x180/0x2c0)
[   34.816314] [<c0578714>] (__cpufreq_driver_target) from [<c007c14c>] (cpufreq_sched_try_driver_target+0x48/0x74)
[   34.816324] [<c007c14c>] (cpufreq_sched_try_driver_target) from [<c007c1e8>] (cpufreq_sched_thread+0x70/0x2b0)
[   34.816336] [<c007c1e8>] (cpufreq_sched_thread) from [<c004ce30>] (kthread+0xf4/0x114)
[   34.816347] [<c004ce30>] (kthread) from [<c000fdd0>] (ret_from_fork+0x14/0x24)
[   34.816355] ---[ end trace 30e92db342678467 ]---

Maybe we could cope with what you are saying with an atomic flag
indicating that the kthread is currently servicing a request? Like
extending the finish_last_request thing to cover this case as well.

Best,

- Juri
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help