Re: [PATCH 6/7] psi: pressure stall information for CPU, memory, and IO
From: Randy Dunlap <hidden>
Date: 2018-05-08 00:43:22
Also in:
linux-mm, lkml
On 05/07/2018 02:01 PM, Johannes Weiner wrote:
quoted hunk ↗ jump to hunk
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> --- Documentation/accounting/psi.txt | 73 ++++++ include/linux/psi.h | 27 ++ include/linux/psi_types.h | 84 ++++++ include/linux/sched.h | 10 + include/linux/sched/stat.h | 10 +- init/Kconfig | 16 ++ kernel/fork.c | 4 + kernel/sched/Makefile | 1 + kernel/sched/core.c | 3 + kernel/sched/psi.c | 424 +++++++++++++++++++++++++++++++ kernel/sched/sched.h | 166 ++++++------ kernel/sched/stats.h | 91 ++++++- mm/compaction.c | 5 + mm/filemap.c | 15 +- mm/page_alloc.c | 10 + mm/vmscan.c | 13 + 16 files changed, 859 insertions(+), 93 deletions(-) create mode 100644 Documentation/accounting/psi.txt create mode 100644 include/linux/psi.h create mode 100644 include/linux/psi_types.h create mode 100644 kernel/sched/psi.cdiff --git a/Documentation/accounting/psi.txt b/Documentation/accounting/psi.txt new file mode 100644 index 000000000000..e051810d5127 --- /dev/null +++ b/Documentation/accounting/psi.txt@@ -0,0 +1,73 @@
Looks good to me.
quoted hunk ↗ jump to hunk
diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c new file mode 100644 index 000000000000..052c529a053b --- /dev/null +++ b/kernel/sched/psi.c@@ -0,0 +1,424 @@ +/* + * Measure workload productivity impact from overcommitting CPU, memory, IO + * + * Copyright (c) 2017 Facebook, Inc. + * Author: Johannes Weiner <hannes@cmpxchg.org> + * + * Implementation + * + * Task states -- running, iowait, memstall -- are tracked through the + * scheduler and aggregated into a system-wide productivity state. The + * ratio between the times spent in productive states and delays tells + * us the overall productivity of the workload. + * + * The ratio is tracked in decaying time averages over 10s, 1m, 5m + * windows. Cumluative stall times are tracked and exported as well to
Cumulative
+ * allow detection of latency spikes and custom time averaging. + * + * Multiple CPUs + * + * To avoid cache contention, times are tracked local to the CPUs. To + * get a comprehensive view of a system or cgroup, we have to consider + * the fact that CPUs could be unevenly loaded or even entirely idle + * if the workload doesn't have enough threads. To avoid artifacts + * caused by that, when adding up the global pressure ratio, the + * CPU-local ratios are weighed according to their non-idle time: + * + * Time the CPU had stalled tasks Time the CPU was non-idle + * ------------------------------ * --------------------------- + * Walltime Time all CPUs were non-idle + */
+ +/** + * psi_memstall_leave - mark the end of an memory stall section
end of a memory
+ * @flags: flags to handle nested memdelay sections
+ *
+ * Marks the calling task as no longer stalled due to lack of memory.
+ */
+void psi_memstall_leave(unsigned long *flags)
+{-- ~Randy