Thread (21 messages) 21 messages, 3 authors, 2021-05-24

Re: [PATCH 3/6] mm/page_alloc: Adjust pcp->high after CPU hotplug events

From: Dave Hansen <hidden>
Date: 2021-05-24 15:58:56
Also in: lkml

On 5/24/21 2:07 AM, Mel Gorman wrote:
On Fri, May 21, 2021 at 03:13:35PM -0700, Dave Hansen wrote:
quoted
On 5/21/21 3:28 AM, Mel Gorman wrote:
quoted
The PCP high watermark is based on the number of online CPUs so the
watermarks must be adjusted during CPU hotplug. At the time of
hot-remove, the number of online CPUs is already adjusted but during
hot-add, a delta needs to be applied to update PCP to the correct
value. After this patch is applied, the high watermarks are adjusted
correctly.

  # grep high: /proc/zoneinfo  | tail -1
              high:  649
  # echo 0 > /sys/devices/system/cpu/cpu4/online
  # grep high: /proc/zoneinfo  | tail -1
              high:  664
  # echo 1 > /sys/devices/system/cpu/cpu4/online
  # grep high: /proc/zoneinfo  | tail -1
              high:  649
This is actually a comment more about the previous patch, but it doesn't
really become apparent until the example above.

In your example, you mentioned increased exit() performance by using
"vm.percpu_pagelist_fraction to increase the pcp->high value".  That's
presumably because of the increased batching effects and fewer lock
acquisitions.
Yes
quoted
But, logically, doesn't that mean that, the more CPUs you have in a
node, the *higher* you want pcp->high to be?  If we took this to the
extreme and had an absurd number of CPUs in a node, we could end up with
a too-small pcp->high value.
I see your point but I don't think increasing pcp->high for larger
numbers of CPUs is the right answer because then reclaim can be
triggered simply because too many PCPs have pages.

To address your point requires much deeper surgery.
...
There is value to doing something like this but it's beyond what this
series is trying to do and doing the work without introducing regressions
would be very difficult.
Agreed, such a solution is outside of the scope of what this set is
trying to do.

It would be nice to touch on this counter-intuitive property in the
changelog, and *maybe* add a WARN_ON_ONCE() if we hit an edge case.
Maybe WARN_ON_ONCE() if pcp->high gets below pcp->batch*SOMETHING.





Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help