Thread (17 messages) 17 messages, 3 authors, 2021-04-19

Re: [PATCH v2] sched: Warn on long periods of pending need_resched

From: Peter Zijlstra <peterz@infradead.org>
Date: 2021-03-26 08:59:48
Also in: linux-fsdevel, lkml

On Thu, Mar 25, 2021 at 02:58:52PM -0700, Josh Don wrote:
quoted
On Wed, Mar 24, 2021 at 01:39:16PM +0000, Mel Gorman wrote:
I'm not going to NAK because I do not have hard data that shows they must
exist. However, I won't ACK either because I bet a lot of tasty beverages
the next time we meet that the following parameters will generate reports
if removed.

kernel.sched_latency_ns
kernel.sched_migration_cost_ns
kernel.sched_min_granularity_ns
kernel.sched_wakeup_granularity_ns

I know they are altered by tuned for different profiles and some people do
go the effort to create custom profiles for specific applications. They
also show up in "Official Benchmarking" such as SPEC CPU 2017 and
some vendors put a *lot* of effort into SPEC CPU results for bragging
rights. They show up in technical books and best practice guids for
applications.  Finally they show up in Google when searching for "tuning
sched_foo". I'm not saying that any of these are even accurate or a good
idea, just that they show up near the top of the results and they are
sufficiently popular that they might as well be an ABI.
+1, these seem like sufficiently well-known scheduler tunables, and
not really SCHED_DEBUG.
So we've never made any guarantees on their behaviour, nor am I willing
to make any.

In fact, I propose we merge the below along with the debugfs move. Just
to make absolutely sure any 'tuning' is broken.



---
Subject: sched,fair: Alternative sched_slice()
From: Peter Zijlstra <peterz@infradead.org>
Date: Thu Mar 25 13:44:46 CET 2021

The current sched_slice() seems to have issues; there's two possible
things that could be improved:

 - the 'nr_running' used for __sched_period() is daft when cgroups are
   considered. Using the RQ wide h_nr_running seems like a much more
   consistent number.

 - (esp) cgroups can slice it real fine (pun intendend), which makes for
   easy over-scheduling, ensure min_gran is what the name says.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/sched/fair.c     |   15 ++++++++++++++-
 kernel/sched/features.h |    3 +++
 2 files changed, 17 insertions(+), 1 deletion(-)
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -680,7 +680,16 @@ static u64 __sched_period(unsigned long
  */
 static u64 sched_slice(struct cfs_rq *cfs_rq, struct sched_entity *se)
 {
-	u64 slice = __sched_period(cfs_rq->nr_running + !se->on_rq);
+	unsigned int nr_running = cfs_rq->nr_running;
+	u64 slice;
+
+	if (sched_feat(ALT_PERIOD))
+		nr_running = rq_of(cfs_rq)->cfs.h_nr_running;
+
+	slice = __sched_period(nr_running + !se->on_rq);
+
+	if (sched_feat(BASE_SLICE))
+		slice -= sysctl_sched_min_granularity;
 
 	for_each_sched_entity(se) {
 		struct load_weight *load;
@@ -697,6 +706,10 @@ static u64 sched_slice(struct cfs_rq *cf
 		}
 		slice = __calc_delta(slice, se->load.weight, load);
 	}
+
+	if (sched_feat(BASE_SLICE))
+		slice += sysctl_sched_min_granularity;
+
 	return slice;
 }
 
--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -90,3 +90,6 @@ SCHED_FEAT(WA_BIAS, true)
  */
 SCHED_FEAT(UTIL_EST, true)
 SCHED_FEAT(UTIL_EST_FASTUP, true)
+
+SCHED_FEAT(ALT_PERIOD, true)
+SCHED_FEAT(BASE_SLICE, true)
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help