Thread (28 messages) 28 messages, 5 authors, 2021-06-14

Re: [RFC PATCH 0/7] Memory hotplug/hotremove at subsection size

From: Zi Yan <ziy@nvidia.com>
Date: 2021-06-02 15:56:45
Also in: linuxppc-dev, lkml

On 10 May 2021, at 10:36, Zi Yan wrote:
On 7 May 2021, at 10:00, David Hildenbrand wrote:
quoted
On 07.05.21 13:55, Michal Hocko wrote:
quoted
[I haven't read through respective patches due to lack of time but let
  me comment on the general idea and the underlying justification]

On Thu 06-05-21 17:31:09, David Hildenbrand wrote:
quoted
On 06.05.21 17:26, Zi Yan wrote:
quoted
From: Zi Yan <ziy@nvidia.com>

Hi all,

This patchset tries to remove the restriction on memory hotplug/hotremove
granularity, which is always greater or equal to memory section size[1].
With the patchset, kernel is able to online/offline memory at a size independent
of memory section size, as small as 2MB (the subsection size).
... which doesn't make any sense as we can only online/offline whole memory
block devices.
Agreed. The subsection thingy is just a hack to workaround pmem
alignement problems. For the real memory hotplug it is quite hard to
argue for reasonable hotplug scenarios for very small physical memory
ranges wrt. to the existing sparsemem memory model.
quoted
quoted
The motivation is to increase MAX_ORDER of the buddy allocator and pageblock
size without increasing memory hotplug/hotremove granularity at the same time,
Gah, no. Please no. No.
Agreed. Those are completely independent concepts. MAX_ORDER is can be
really arbitrary irrespective of the section size with vmemmap sparse
model. The existing restriction is due to old sparse model not being
able to do page pointer arithmetic across memory sections. Is there any
reason to stick with that memory model for an advance feature you are
working on?
No. I just want to increase MAX_ORDER. If the existing restriction can
be removed, that will be great.
quoted
I gave it some more thought yesterday. I guess the first thing we should look into is increasing MAX_ORDER and leaving pageblock_order and section size as is -- finding out what we have to tweak to get that up and running. Once we have that in place, we can actually look into better fragmentation avoidance etc. One step at a time.
It makes sense to me.
quoted
Because that change itself might require some thought. Requiring that bigger MAX_ORDER depends on SPARSE_VMEMMAP is something reasonable to do.
OK, if with SPARSE_VMEMMAP MAX_ORDER can be set to be bigger than
SECTION_SIZE, it is perfectly OK to me. Since 1GB THP support, which I
want to add ultimately, will require SPARSE_VMEMMAP too (otherwise,
all page++ will need to be changed to nth_page(page,1)).
quoted
As stated somewhere here already, we'll have to look into making alloc_contig_range() (and main users CMA and virtio-mem) independent of MAX_ORDER and mainly rely on pageblock_order. The current handling in alloc_contig_range() is far from optimal as we have to isolate a whole MAX_ORDER - 1 page -- and on ZONE_NORMAL we'll fail easily if any part contains something unmovable although we don't even want to allocate that part. I actually have that on my list (to be able to fully support pageblock_order instead of MAX_ORDER -1 chunks in virtio-mem), however didn't have time to look into it.
So in your mind, for gigantic page allocation (> MAX_ORDER), alloc_contig_range()
should be used instead of buddy allocator while pageblock_order is kept at a small
granularity like 2MB. Is that the case? Isn’t it going to have high fail rate
when any of the pageblocks within a gigantic page range (like 1GB) becomes unmovable?
Are you thinking additional mechanism/policy to prevent such thing happening as
an additional step for gigantic page allocation? Like your ZONE_PREFER_MOVABLE idea?
quoted
Further, page onlining / offlining code and early init code most probably also needs care if MAX_ORDER - 1 crosses sections. Memory holes we might suddenly have in MAX_ORDER - 1 pages might become a problem and will have to be handled. Not sure which other code has to be tweaked (compaction? page isolation?).
Can you elaborate it a little more? From what I understand, memory holes mean valid
PFNs are not contiguous before and after a hole, so pfn++ will not work, but
struct pages are still virtually contiguous assuming SPARSE_VMEMMAP, meaning page++
would still work. So when MAX_ORDER - 1 crosses sections, additional code would be
needed instead of simple pfn++. Is there anything I am missing?

BTW, to test a system with memory holes, do you know is there an easy of adding
random memory holes to an x86_64 VM, which can help reveal potential missing pieces
in the code? Changing BIOS-e820 table might be one way, but I have no idea on
how to do it on QEMU.
quoted
Figuring out what needs care itself might take quite some effort.

One thing I was thinking about as well: The bigger our MAX_ORDER, the slower it could be to allocate smaller pages. If we have 1G pages, splitting them down to 4k then takes 8 additional steps if I'm, not wrong. Of course, that's the worst case. Would be interesting to evaluate.
Sure. I am planning to check it too. As a simple start, I am going to run will it scale
benchmarks to see if there is any performance difference between different MAX_ORDERs.
I ran vm-scalablity and memory-related will-it-scale on a server with 256GB memory to
see the impact of increasing MAX_ORDER and didn’t see much difference for most of
the workloads like page_fault1, page_fault2, and page_fault3 from will-it-scale.
But feel free to check the attached complete results and let me know what should be
looked into. Thanks.

# Environment
Dell R630 with 2x 12-core E5-2650 v4 and 256GB memory.


# Kernel changes
On top of v5.13-rc1-mmotm-2021-05-13-17-18, with SECTION_SIZE_BITS set to 31 and
MAX_ORDER set to 11 and 20 respectively.

# Results of page_fault1, page_fault2, and page_fault3


compiler/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
  gcc-10/defconfig/thread/50%/debian/dellr630/page_fault3/will-it-scale

commit:
  5.13.0-rc1-mm1-max-order-11+
  5.13.0-rc1-mm1-max-order-20+

5.13.0-rc1-mm1-m 5.13.0-rc1-mm1-max-order-20
---------------- ---------------------------
         %stddev     %change         %stddev
             \          |                \
   3199850 ±  2%      +6.0%    3390866 ±  3%  will-it-scale.24.threads
     54.94            +1.7%      55.85        will-it-scale.24.threads_idle
    133326 ±  2%      +6.0%     141285 ±  3%  will-it-scale.per_thread_ops
   3199850 ±  2%      +6.0%    3390866 ±  3%  will-it-scale.workload

=========================================================================================
compiler/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
  gcc-10/defconfig/thread/50%/debian/dellr630/page_fault2/will-it-scale

commit:
  5.13.0-rc1-mm1-max-order-11+
  5.13.0-rc1-mm1-max-order-20+

5.13.0-rc1-mm1-m 5.13.0-rc1-mm1-max-order-20
---------------- ---------------------------
         %stddev     %change         %stddev
             \          |                \
   2016984            -6.6%    1883075 ±  2%  will-it-scale.24.threads
     69.69            -4.4%      66.64        will-it-scale.24.threads_idle
     84040            -6.6%      78461 ±  2%  will-it-scale.per_thread_ops
   2016984            -6.6%    1883075 ±  2%  will-it-scale.workload

=========================================================================================
compiler/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
  gcc-10/defconfig/thread/50%/debian/dellr630/page_fault1/will-it-scale

commit:
  5.13.0-rc1-mm1-max-order-11+
  5.13.0-rc1-mm1-max-order-20+

5.13.0-rc1-mm1-m 5.13.0-rc1-mm1-max-order-20
---------------- ---------------------------
         %stddev     %change         %stddev
             \          |                \
   2138067            -1.3%    2109865        will-it-scale.24.threads
     63.34            +1.1%      64.06        will-it-scale.24.threads_idle
     89085            -1.3%      87910        will-it-scale.per_thread_ops
   2138067            -1.3%    2109865        will-it-scale.workload

=========================================================================================
compiler/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
  gcc-10/defconfig/thread/16/debian/dellr630/page_fault3/will-it-scale

commit:
  5.13.0-rc1-mm1-max-order-11+
  5.13.0-rc1-mm1-max-order-20+

5.13.0-rc1-mm1-m 5.13.0-rc1-mm1-max-order-20
---------------- ---------------------------
         %stddev     %change         %stddev
             \          |                \
   3216287 ±  3%      +4.8%    3369356 ± 10%  will-it-scale.16.threads
     69.18            +0.5%      69.51        will-it-scale.16.threads_idle
    201017 ±  3%      +4.8%     210584 ± 10%  will-it-scale.per_thread_ops
   3216287 ±  3%      +4.8%    3369356 ± 10%  will-it-scale.workload

=========================================================================================
compiler/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
  gcc-10/defconfig/thread/16/debian/dellr630/page_fault2/will-it-scale

commit:
  5.13.0-rc1-mm1-max-order-11+
  5.13.0-rc1-mm1-max-order-20+

5.13.0-rc1-mm1-m 5.13.0-rc1-mm1-max-order-20
---------------- ---------------------------
         %stddev     %change         %stddev
             \          |                \
   2005510            -2.7%    1950620 ±  2%  will-it-scale.16.threads
     78.77            -0.2%      78.64        will-it-scale.16.threads_idle
    125344            -2.7%     121913 ±  2%  will-it-scale.per_thread_ops
   2005510            -2.7%    1950620 ±  2%  will-it-scale.workload

=========================================================================================
compiler/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
  gcc-10/defconfig/thread/16/debian/dellr630/page_fault1/will-it-scale

commit:
  5.13.0-rc1-mm1-max-order-11+
  5.13.0-rc1-mm1-max-order-20+

5.13.0-rc1-mm1-m 5.13.0-rc1-mm1-max-order-20
---------------- ---------------------------
         %stddev     %change         %stddev
             \          |                \
   2332446            -6.5%    2179823 ±  2%  will-it-scale.16.threads
     77.57            -2.0%      76.03        will-it-scale.16.threads_idle
    145777            -6.5%     136238 ±  2%  will-it-scale.per_thread_ops
   2332446            -6.5%    2179823 ±  2%  will-it-scale.workload

=========================================================================================
compiler/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
  gcc-10/defconfig/thread/100%/debian/dellr630/page_fault3/will-it-scale

commit:
  5.13.0-rc1-mm1-max-order-11+
  5.13.0-rc1-mm1-max-order-20+

5.13.0-rc1-mm1-m 5.13.0-rc1-mm1-max-order-20
---------------- ---------------------------
         %stddev     %change         %stddev
             \          |                \
   3236057 ±  2%      -4.5%    3089222 ±  4%  will-it-scale.48.threads
     24.64 ±  7%      -3.3%      23.83 ±  2%  will-it-scale.48.threads_idle
     67417 ±  2%      -4.5%      64358 ±  4%  will-it-scale.per_thread_ops
   3236057 ±  2%      -4.5%    3089222 ±  4%  will-it-scale.workload

=========================================================================================
compiler/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
  gcc-10/defconfig/thread/100%/debian/dellr630/page_fault2/will-it-scale

commit:
  5.13.0-rc1-mm1-max-order-11+
  5.13.0-rc1-mm1-max-order-20+

5.13.0-rc1-mm1-m 5.13.0-rc1-mm1-max-order-20
---------------- ---------------------------
         %stddev     %change         %stddev
             \          |                \
   1611363            -0.1%    1609891        will-it-scale.48.threads
     47.42 ±  2%      +1.2%      48.01        will-it-scale.48.threads_idle
     33569            -0.1%      33539        will-it-scale.per_thread_ops
   1611363            -0.1%    1609891        will-it-scale.workload

=========================================================================================
compiler/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
  gcc-10/defconfig/thread/100%/debian/dellr630/page_fault1/will-it-scale

commit:
  5.13.0-rc1-mm1-max-order-11+
  5.13.0-rc1-mm1-max-order-20+

5.13.0-rc1-mm1-m 5.13.0-rc1-mm1-max-order-20
---------------- ---------------------------
         %stddev     %change         %stddev
             \          |                \
   1776494 ±  3%      -2.6%    1730693        will-it-scale.48.threads
     43.36 ±  4%      +0.5%      43.57 ±  2%  will-it-scale.48.threads_idle
     37010 ±  3%      -2.6%      36055        will-it-scale.per_thread_ops
   1776494 ±  3%      -2.6%    1730693        will-it-scale.workload

=========================================================================================
compiler/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
  gcc-10/defconfig/process/50%/debian/dellr630/page_fault3/will-it-scale

commit:
  5.13.0-rc1-mm1-max-order-11+
  5.13.0-rc1-mm1-max-order-20+

5.13.0-rc1-mm1-m 5.13.0-rc1-mm1-max-order-20
---------------- ---------------------------
         %stddev     %change         %stddev
             \          |                \
  15235214            -0.3%   15185167        will-it-scale.24.processes
     49.63            -0.4%      49.45        will-it-scale.24.processes_idle
    634800            -0.3%     632715        will-it-scale.per_process_ops
  15235214            -0.3%   15185167        will-it-scale.workload

=========================================================================================
compiler/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
  gcc-10/defconfig/process/50%/debian/dellr630/page_fault2/will-it-scale

commit:
  5.13.0-rc1-mm1-max-order-11+
  5.13.0-rc1-mm1-max-order-20+

5.13.0-rc1-mm1-m 5.13.0-rc1-mm1-max-order-20
---------------- ---------------------------
         %stddev     %change         %stddev
             \          |                \
   6700813            -0.6%    6662570        will-it-scale.24.processes
     49.17            +0.0%      49.18        will-it-scale.24.processes_idle
    279200            -0.6%     277606        will-it-scale.per_process_ops
   6700813            -0.6%    6662570        will-it-scale.workload

=========================================================================================
compiler/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
  gcc-10/defconfig/process/50%/debian/dellr630/page_fault1/will-it-scale

commit:
  5.13.0-rc1-mm1-max-order-11+
  5.13.0-rc1-mm1-max-order-20+

5.13.0-rc1-mm1-m 5.13.0-rc1-mm1-max-order-20
---------------- ---------------------------
         %stddev     %change         %stddev
             \          |                \
   8052059            -1.2%    7952172        will-it-scale.24.processes
     49.48            -0.4%      49.29        will-it-scale.24.processes_idle
    335502            -1.2%     331340        will-it-scale.per_process_ops
   8052059            -1.2%    7952172        will-it-scale.workload

=========================================================================================
compiler/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
  gcc-10/defconfig/process/16/debian/dellr630/page_fault3/will-it-scale

commit:
  5.13.0-rc1-mm1-max-order-11+
  5.13.0-rc1-mm1-max-order-20+

5.13.0-rc1-mm1-m 5.13.0-rc1-mm1-max-order-20
---------------- ---------------------------
         %stddev     %change         %stddev
             \          |                \
  10152559            +0.7%   10221240        will-it-scale.16.processes
     66.10            -0.0%      66.09        will-it-scale.16.processes_idle
    634534            +0.7%     638827        will-it-scale.per_process_ops
  10152559            +0.7%   10221240        will-it-scale.workload

=========================================================================================
compiler/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
  gcc-10/defconfig/process/16/debian/dellr630/page_fault2/will-it-scale

commit:
  5.13.0-rc1-mm1-max-order-11+
  5.13.0-rc1-mm1-max-order-20+

5.13.0-rc1-mm1-m 5.13.0-rc1-mm1-max-order-20
---------------- ---------------------------
         %stddev     %change         %stddev
             \          |                \
   4621434            -1.0%    4576959        will-it-scale.16.processes
     66.14            -0.2%      65.98        will-it-scale.16.processes_idle
    288839            -1.0%     286059        will-it-scale.per_process_ops
   4621434            -1.0%    4576959        will-it-scale.workload

=========================================================================================
compiler/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
  gcc-10/defconfig/process/16/debian/dellr630/page_fault1/will-it-scale

commit:
  5.13.0-rc1-mm1-max-order-11+
  5.13.0-rc1-mm1-max-order-20+

5.13.0-rc1-mm1-m 5.13.0-rc1-mm1-max-order-20
---------------- ---------------------------
         %stddev     %change         %stddev
             \          |                \
   5546153            -1.3%    5474778        will-it-scale.16.processes
     66.02            -0.1%      65.98        will-it-scale.16.processes_idle
    346634            -1.3%     342173        will-it-scale.per_process_ops
   5546153            -1.3%    5474778        will-it-scale.workload

=========================================================================================
compiler/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
  gcc-10/defconfig/process/100%/debian/dellr630/page_fault3/will-it-scale

commit:
  5.13.0-rc1-mm1-max-order-11+
  5.13.0-rc1-mm1-max-order-20+

5.13.0-rc1-mm1-m 5.13.0-rc1-mm1-max-order-20
---------------- ---------------------------
         %stddev     %change         %stddev
             \          |                \
  20575719            +0.4%   20651992        will-it-scale.48.processes
      0.06            +5.6%       0.06 ±  7%  will-it-scale.48.processes_idle
    428660            +0.4%     430249        will-it-scale.per_process_ops
  20575719            +0.4%   20651992        will-it-scale.workload

=========================================================================================
compiler/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
  gcc-10/defconfig/process/100%/debian/dellr630/page_fault2/will-it-scale

commit:
  5.13.0-rc1-mm1-max-order-11+
  5.13.0-rc1-mm1-max-order-20+

5.13.0-rc1-mm1-m 5.13.0-rc1-mm1-max-order-20
---------------- ---------------------------
         %stddev     %change         %stddev
             \          |                \
   6984071            -1.1%    6906022        will-it-scale.48.processes
      0.07            +4.8%       0.07 ±  6%  will-it-scale.48.processes_idle
    145501            -1.1%     143875        will-it-scale.per_process_ops
   6984071            -1.1%    6906022        will-it-scale.workload

=========================================================================================
compiler/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
  gcc-10/defconfig/process/100%/debian/dellr630/page_fault1/will-it-scale

commit:
  5.13.0-rc1-mm1-max-order-11+
  5.13.0-rc1-mm1-max-order-20+

5.13.0-rc1-mm1-m 5.13.0-rc1-mm1-max-order-20
---------------- ---------------------------
         %stddev     %change         %stddev
             \          |                \
   7527654            -1.7%    7399284        will-it-scale.48.processes
      0.07            +0.0%       0.07        will-it-scale.48.processes_idle
    156826            -1.7%     154151        will-it-scale.per_process_ops
   7527654            -1.7%    7399284        will-it-scale.workload




—
Best Regards,
Yan, Zi

Attachments

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help