Re: [PATCH 3/3] sched: Aggressive balance in domains whose groups share package resources
From: Peter Zijlstra <peterz@infradead.org>
Date: 2013-10-28 15:53:37
Also in:
lkml
On Mon, Oct 21, 2013 at 05:15:02PM +0530, Vaidyanathan Srinivasan wrote:
From: Preeti U Murthy <redacted> The current logic in load balance is such that after picking the busiest group, the load is attempted to be moved from the busiest cpu in that group to the dst_cpu. If the load cannot be moved from the busiest cpu to dst_cpu due to either tsk_cpus_allowed mask or cache hot tasks, then the dst_cpu is changed to be another idle cpu within the dst->grpmask. If even then, the load cannot be moved from the busiest cpu, then the source group is changed. The next busiest group is found and the above steps are repeated. However if the cpus in the group share package resources, then when a load movement from the busiest cpu in this group fails as above, instead of finding the next busiest group to move load from, find the next busiest cpu *within the same group* from which to move load away. By doing so, a conscious effort is made during load balancing to keep just one cpu busy as much as possible within domains that have SHARED_PKG_RESOURCES flag set unless under scenarios of high load. Having multiple cpus busy within a domain which share package resource could lead to a performance hit. A similar scenario arises in active load balancing as well. When the current task on the busiest cpu cannot be moved away due to task pinning, currently no more attempts at load balancing is made.
This patch checks if the balancing is being done on a group whose cpus share package resources. If so, then check if the load balancing can be done for other cpus in the same group.
So I absolutely hate this patch... Also I'm not convinced I actually understand the explanation above. Furthermore; there is nothing special about spreading tasks for SHARED_PKG_RESOURCES and special casing that one case is just wrong. If anything it should be keyed off of SD_PREFER_SIBLING and or cpu_power.