Re: [PATCH 3/6] mm/page_alloc: Adjust pcp->high after CPU hotplug events
From: Dave Hansen <hidden>
Date: 2021-05-24 15:58:56
Also in:
lkml
On 5/24/21 2:07 AM, Mel Gorman wrote:
On Fri, May 21, 2021 at 03:13:35PM -0700, Dave Hansen wrote:quoted
On 5/21/21 3:28 AM, Mel Gorman wrote:quoted
The PCP high watermark is based on the number of online CPUs so the watermarks must be adjusted during CPU hotplug. At the time of hot-remove, the number of online CPUs is already adjusted but during hot-add, a delta needs to be applied to update PCP to the correct value. After this patch is applied, the high watermarks are adjusted correctly. # grep high: /proc/zoneinfo | tail -1 high: 649 # echo 0 > /sys/devices/system/cpu/cpu4/online # grep high: /proc/zoneinfo | tail -1 high: 664 # echo 1 > /sys/devices/system/cpu/cpu4/online # grep high: /proc/zoneinfo | tail -1 high: 649This is actually a comment more about the previous patch, but it doesn't really become apparent until the example above. In your example, you mentioned increased exit() performance by using "vm.percpu_pagelist_fraction to increase the pcp->high value". That's presumably because of the increased batching effects and fewer lock acquisitions.Yesquoted
But, logically, doesn't that mean that, the more CPUs you have in a node, the *higher* you want pcp->high to be? If we took this to the extreme and had an absurd number of CPUs in a node, we could end up with a too-small pcp->high value.I see your point but I don't think increasing pcp->high for larger numbers of CPUs is the right answer because then reclaim can be triggered simply because too many PCPs have pages. To address your point requires much deeper surgery.
...
There is value to doing something like this but it's beyond what this series is trying to do and doing the work without introducing regressions would be very difficult.
Agreed, such a solution is outside of the scope of what this set is trying to do. It would be nice to touch on this counter-intuitive property in the changelog, and *maybe* add a WARN_ON_ONCE() if we hit an edge case. Maybe WARN_ON_ONCE() if pcp->high gets below pcp->batch*SOMETHING.