Re: [RFC PATCH v3 05/10] sched/fair: Don't consider paravirt CPUs for wakeup and load balance
From: Shrikanth Hegde <hidden>
Date: 2025-11-08 12:05:21
Also in:
lkml
On 9/11/25 10:53 AM, K Prateek Nayak wrote:
Hello Shrikanth, On 9/10/2025 11:12 PM, Shrikanth Hegde wrote:quoted
@@ -8563,7 +8563,7 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int wake_flags) if (!is_rd_overutilized(this_rq()->rd)) { new_cpu = find_energy_efficient_cpu(p, prev_cpu); if (new_cpu >= 0) - return new_cpu; + goto check_new_cpu;Should this fallback to the overutilized path if the most energy efficient CPU is found to be paravirtualized or should find_energy_efficient_cpu() be made aware of it?
While thinking about this, are there any such system which has vCPUs and overcommits and still has energy model backing it? Highly unlikely. So, I am planning to put a warning there and see if any such usage exists there.
quoted
new_cpu = prev_cpu; }@@ -8605,7 +8605,12 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int wake_flags) } rcu_read_unlock(); - return new_cpu; + /* If newly found or prev_cpu is a paravirt cpu, use current cpu */ +check_new_cpu: + if (is_cpu_paravirt(new_cpu)) + return cpu; + elsenit. redundant else.quoted
+ return new_cpu; } /*@@ -11734,6 +11739,12 @@ static int sched_balance_rq(int this_cpu, struct rq *this_rq, cpumask_and(cpus, sched_domain_span(sd), cpu_active_mask); +#ifdef CONFIG_PARAVIRT + /* Don't spread load to paravirt CPUs */ + if (static_branch_unlikely(&cpu_paravirt_push_tasks)) + cpumask_andnot(cpus, cpus, cpu_paravirt_mask); +#endifCan something similar be also be done in select_idle_sibling() and sched_balance_find_dst_cpu() for wakeup path?
We have this pattern in select_task_rq_fair cpu = smp_processor_id(); new_cpu = prev_cpu; task is waking up after a while, so likely prev_cpu is marked as paravirt and in such cases we should return current cpu. if current cpu is paravirt(unlikely), and prev_cpu is also paravirt, then should return current cpu. In next sched tick it will be pushed out. select_idle_sibling(p, prev_cpu, new_cpu); - (new_cpu will remain prev_cpu if wake_affine doesn't change it) Will have to change the prototype to send current cpu as well.