Thread (33 messages) 33 messages, 5 authors, 2025-11-10

Re: [RFC PATCH v3 05/10] sched/fair: Don't consider paravirt CPUs for wakeup and load balance

From: Shrikanth Hegde <hidden>
Date: 2025-11-08 12:05:21
Also in: lkml


On 9/11/25 10:53 AM, K Prateek Nayak wrote:
Hello Shrikanth,

On 9/10/2025 11:12 PM, Shrikanth Hegde wrote:
quoted
@@ -8563,7 +8563,7 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int wake_flags)
  		if (!is_rd_overutilized(this_rq()->rd)) {
  			new_cpu = find_energy_efficient_cpu(p, prev_cpu);
  			if (new_cpu >= 0)
-				return new_cpu;
+				goto check_new_cpu;
Should this fallback to the overutilized path if the most energy
efficient CPU is found to be paravirtualized or should
find_energy_efficient_cpu() be made aware of it?

While thinking about this, are there any such system which has vCPUs and
overcommits and still has energy model backing it?

Highly unlikely. So, I am planning to put a warning there and see if any
such usage exists there.
quoted
  			new_cpu = prev_cpu;
  		}
  
@@ -8605,7 +8605,12 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int wake_flags)
  	}
  	rcu_read_unlock();
  
-	return new_cpu;
+	/* If newly found or prev_cpu is a paravirt cpu, use current cpu */
+check_new_cpu:
+	if (is_cpu_paravirt(new_cpu))
+		return cpu;
+	else
nit. redundant else.
quoted
+		return new_cpu;
  }
  
  /*
@@ -11734,6 +11739,12 @@ static int sched_balance_rq(int this_cpu, struct rq *this_rq,
  
  	cpumask_and(cpus, sched_domain_span(sd), cpu_active_mask);
  
+#ifdef CONFIG_PARAVIRT
+	/* Don't spread load to paravirt CPUs */
+	if (static_branch_unlikely(&cpu_paravirt_push_tasks))
+		cpumask_andnot(cpus, cpus, cpu_paravirt_mask);
+#endif
Can something similar be also be done in select_idle_sibling() and
sched_balance_find_dst_cpu() for wakeup path?
We have this pattern in select_task_rq_fair

cpu = smp_processor_id();
new_cpu = prev_cpu;

task is waking up after a while, so likely prev_cpu is marked as paravirt
and in such cases we should return current cpu. if current cpu is paravirt(unlikely),
and prev_cpu is also paravirt, then should return current cpu.
In next sched tick it will be pushed out.

select_idle_sibling(p, prev_cpu, new_cpu); - (new_cpu will remain prev_cpu if wake_affine doesn't change it)
Will have to change the prototype to send current cpu as well.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help