Re: [patch 2/2] MM: allow per-cpu vmstat_threshold and vmstat_worker configuration
From: Marcelo Tosatti <hidden>
Date: 2017-05-02 17:21:55
Also in:
linux-mm, lkml
On Tue, May 02, 2017 at 01:15:27PM -0400, Luiz Capitulino wrote:
On Tue, 2 May 2017 13:52:00 -0300 Marcelo Tosatti [off-list ref] wrote:quoted
quoted
I have several questions about the tunables: - What does the vmstat_threshold value mean? What are the implications of changing this value? What's the difference in choosing 1, 2, 3 or 500?Its the maximum value for a vmstat statistics counter to hold. After that value, the statistics are transferred to the global counter: void __mod_node_page_state(struct pglist_data *pgdat, enum node_stat_item item, long delta) { struct per_cpu_nodestat __percpu *pcp = pgdat->per_cpu_nodestats; s8 __percpu *p = pcp->vm_node_stat_diff + item; long x; long t; x = delta + __this_cpu_read(*p); t = __this_cpu_read(pcp->stat_threshold); if (unlikely(x > t || x < -t)) { node_page_state_add(x, pgdat, item); x = 0; } __this_cpu_write(*p, x); } EXPORT_SYMBOL(__mod_node_page_state); BTW, there is a bug there, should change that to: if (unlikely(x >= t || x <= -t)) { Increasing the threshold value does two things: 1) It decreases the number of inter-processor accesses. 2) It increases how much the global counters stay out of sync relative to actual current values.OK, but I'm mostly concerned with the sysadmin who will have to change the tunable. So, I think it's a good idea to improve the doc to contain that information.
Yes, how is that: Index: linux-2.6-git-disable-vmstat-worker/Documentation/vm/vmstat_thresholds.txt ===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-git-disable-vmstat-worker/Documentation/vm/vmstat_thresholds.txt 2017-05-02 13:48:45.946840708 -0300@@ -0,0 +1,78 @@ +Userspace configurable vmstat thresholds +======================================== + +This document describes the tunables to control +per-CPU vmstat threshold and per-CPU vmstat worker +thread. + +/sys/devices/system/cpu/cpuN/vmstat/vmstat_threshold: + +This file contains the per-CPU vmstat threshold. +This value is the maximum that a single per-CPU vmstat statistic +can accumulate before transferring to the global counters. + +A value of 0 indicates that the value is set +by the in kernel algorithm. + +A value different than 0 indicates that particular +value is used for vmstat_threshold. + +/sys/devices/system/cpu/cpuN/vmstat/vmstat_worker: + +Enable/disable the per-CPU vmstat worker. + +What does the vmstat_threshold value mean? What are the implications +of changing this value? What's the difference in choosing 1, 2, 3 +or 500? +==================================================================== + +Its the maximum value for a vmstat statistics counter to hold. After +that value, the statistics are transferred to the global counter: + +void __mod_node_page_state(struct pglist_data *pgdat, enum node_stat_item item, + long delta) +{ + struct per_cpu_nodestat __percpu *pcp = pgdat->per_cpu_nodestats; + s8 __percpu *p = pcp->vm_node_stat_diff + item; + long x; + long t; + + x = delta + __this_cpu_read(*p); + + t = __this_cpu_read(pcp->stat_threshold); + + if (unlikely(x > t || x < -t)) { + node_page_state_add(x, pgdat, item); + x = 0; + } + __this_cpu_write(*p, x); +} + +Increasing the threshold value does two things: + 1) It decreases the number of inter-processor accesses. + 2) It increases how much the global counters stay out of + sync relative to actual current values. + + +Usage example: +============= + +In a realtime system, the worker thread waking up and executing +vmstat_update can be an undesired source of latencies. + +To avoid the worker thread from waking up, executing vmstat_update +on cpu 1, for example, perform the following steps: + + +cd /sys/devices/system/cpu/cpu0/vmstat/ + +# Set vmstat threshold to 1 for cpu1, so that no +# vmstat statistics are collected in cpu1's per-cpu +# stats, instead they are immediately transferred +# to the global counter. + +$ echo 1 > vmstat_threshold + +# Disable vmstat_update worker for cpu1: +$ echo 0 > vmstat_worker +
quoted
quoted
- If the purpose of having vmstat_threshold is to allow disabling the vmstat kworker, why can't the kernel pick a value automatically?Because it might be acceptable for the user to accept a small out of syncedness of the global counters in favour of performance (one would have to analyze the situation). Setting vmstat_threshold == 1 means the global counter is always in sync with the page counter state of the pCPU.IMHO, if vmstat_threshold == 1 is the required setting for disabling the vmstat kworker then I'd go with only one tunable for now. But that's just a suggestion.
I didnt want to force that on the user because allowing different tunables covers more cases.
quoted
quoted
- What are the implications of disabling the vmstat kworker? Will vm stats still be collected someway or will it be completely off for the CPU?It will not be necessary to collect vmstats because at every modification of the vm statistics, pCPUs with vmstat_threshold=1 transfer their values to the global counters (that is, there is no queueing of statistics locally to improve performance).Ah, OK. Got this now. I'll give this patch a try. But I think we want to hear from Christoph (who worked on reducing the vmstat interruptions in the past).
Christoph?
quoted
quoted
Also, shouldn't this patch be split into two?First add one sysfs file, then add another sysfs file, you mean?Yes, one tunable per patch.
Sure. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>