Re: [PATCH v5 6/6] sched/fair: Consider SMT in ASYM_PACKING load balance
From: Vincent Guittot <vincent.guittot@linaro.org>
Date: 2021-09-17 07:41:37
Also in:
lkml
On Fri, 17 Sept 2021 at 03:01, Ricardo Neri [off-list ref] wrote:
On Wed, Sep 15, 2021 at 05:43:44PM +0200, Vincent Guittot wrote:quoted
On Sat, 11 Sept 2021 at 03:19, Ricardo Neri [off-list ref] wrote:quoted
When deciding to pull tasks in ASYM_PACKING, it is necessary not only to check for the idle state of the destination CPU, dst_cpu, but also of its SMT siblings. If dst_cpu is idle but its SMT siblings are busy, performance suffers if it pulls tasks from a medium priority CPU that does not have SMT siblings. Implement asym_smt_can_pull_tasks() to inspect the state of the SMT siblings of both dst_cpu and the CPUs in the candidate busiest group. Cc: Aubrey Li <redacted> Cc: Ben Segall <bsegall@google.com> Cc: Daniel Bristot de Oliveira <redacted> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Quentin Perret <redacted> Cc: Rafael J. Wysocki <redacted> Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Tim Chen <redacted> Reviewed-by: Joel Fernandes (Google) <redacted> Reviewed-by: Len Brown <redacted> Signed-off-by: Ricardo Neri <redacted> --- Changes since v4: * Use sg_lb_stats::sum_nr_running the idle state of a scheduling group. (Vincent, Peter) * Do not even idle CPUs in asym_smt_can_pull_tasks(). (Vincent) * Updated function documentation and corrected a typo. Changes since v3: * Removed the arch_asym_check_smt_siblings() hook. Discussions with the powerpc folks showed that this patch should not impact them. Also, more recent powerpc processor no longer use asym_packing. (PeterZ) * Removed unnecessary local variable in asym_can_pull_tasks(). (Dietmar) * Removed unnecessary check for local CPUs when the local group has zero utilization. (Joel) * Renamed asym_can_pull_tasks() as asym_smt_can_pull_tasks() to reflect the fact that it deals with SMT cases. * Made asym_smt_can_pull_tasks() return false for !CONFIG_SCHED_SMT so that callers can deal with non-SMT cases. Changes since v2: * Reworded the commit message to reflect updates in code. * Corrected misrepresentation of dst_cpu as the CPU doing the load balancing. (PeterZ) * Removed call to arch_asym_check_smt_siblings() as it is now called in sched_asym(). Changes since v1: * Don't bailout in update_sd_pick_busiest() if dst_cpu cannot pull tasks. Instead, reclassify the candidate busiest group, as it may still be selected. (PeterZ) * Avoid an expensive and unnecessary call to cpumask_weight() when determining if a sched_group is comprised of SMT siblings. (PeterZ). --- kernel/sched/fair.c | 94 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 94 insertions(+)diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 26db017c14a3..8d763dd0174b 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c@@ -8597,10 +8597,98 @@ group_type group_classify(unsigned int imbalance_pct, return group_has_spare; } +/** + * asym_smt_can_pull_tasks - Check whether the load balancing CPU can pull tasks + * @dst_cpu: Destination CPU of the load balancing + * @sds: Load-balancing data with statistics of the local group + * @sgs: Load-balancing statistics of the candidate busiest group + * @sg: The candidate busiest group + * + * Check the state of the SMT siblings of both @sds::local and @sg and decide + * if @dst_cpu can pull tasks. + * + * If @dst_cpu does not have SMT siblings, it can pull tasks if two or more of + * the SMT siblings of @sg are busy. If only one CPU in @sg is busy, pull tasks + * only if @dst_cpu has higher priority. + * + * If both @dst_cpu and @sg have SMT siblings, and @sg has exactly one more + * busy CPU than @sds::local, let @dst_cpu pull tasks if it has higher priority. + * Bigger imbalances in the number of busy CPUs will be dealt with in + * update_sd_pick_busiest(). + * + * If @sg does not have SMT siblings, only pull tasks if all of the SMT siblings + * of @dst_cpu are idle and @sg has lower priority. + */ +static bool asym_smt_can_pull_tasks(int dst_cpu, struct sd_lb_stats *sds, + struct sg_lb_stats *sgs, + struct sched_group *sg) +{ +#ifdef CONFIG_SCHED_SMT + bool local_is_smt, sg_is_smt; + int sg_busy_cpus; + + local_is_smt = sds->local->flags & SD_SHARE_CPUCAPACITY; + sg_is_smt = sg->flags & SD_SHARE_CPUCAPACITY; + + sg_busy_cpus = sgs->group_weight - sgs->idle_cpus; + + if (!local_is_smt) { + /* + * If we are here, @dst_cpu is idle and does not have SMT + * siblings. Pull tasks if candidate group has two or more + * busy CPUs. + */ + if (sg_is_smt && sg_busy_cpus >= 2)Do you really need to test sg_is_smt ? if sg_busy_cpus >= 2 then sd_is_smt must be true ?Thank you very much for your feedback Vincent! Yes, it is true that sg_busy_cpus >=2 is only true if @sg is SMT. I will remove this check.quoted
Also, This is the default behavior where we want to even the number of busy cpu. Shouldn't you return false and fall back to the default behavior ?This is also true.quoted
That being said, the default behavior tries to even the number of idle cpus which is easier to compute and is equal to even the number of busy cpus in "normal" system with the same number of cpus in groups but this is not the case here. It could be good to change the default behavior to even the number of busy cpus and that you use the default behavior here. Additional condition will be used to select the busiest group like more busy cpu or more number of running tasksThat is a very good observation. Checking the number of idle CPUs assumes that both groups have the same number of CPUs. I'll look into modifying the default behavior.quoted
quoted
+ return true; + + /* + * @dst_cpu does not have SMT siblings. @sg may have SMT + * siblings and only one is busy. In such case, @dst_cpu + * can help if it has higher priority and is idle (i.e., + * it has no running tasks).The previous comment above assume that "@dst_cpu is idle" but now you need to check that sds->local_stat.sum_nr_running == 0But we already know that, right? We are here because in update_sg_lb_stats() we determine that dst CPU is idle (env->idle != CPU_NOT_IDLE).
That's my point: Why do you add the condition !sds->local_stat.sum_nr_running below ? I assume that it's to check that the cpu is idle, isn't it ?
quoted
quoted
+ */ + return !sds->local_stat.sum_nr_running && + sched_asym_prefer(dst_cpu, sg->asym_prefer_cpu); + }
Thanks and BR, Ricardo