Thread (6 messages) 6 messages, 2 authors, 2021-06-02

Re: [PATCH 2/2] mm/page_alloc: Allow high-order pages to be stored on the per-cpu lists

From: Jesper Dangaard Brouer <hidden>
Date: 2021-05-31 16:59:52
Also in: lkml, netdev

On Mon, 31 May 2021 13:04:12 +0100
Mel Gorman [off-list ref] wrote:
The per-cpu page allocator (PCP) only stores order-0 pages. This means
that all THP and "cheap" high-order allocations including SLUB contends
on the zone->lock. This patch extends the PCP allocator to store THP and
"cheap" high-order pages. Note that struct per_cpu_pages increases in
size to 256 bytes (4 cache lines) on x86-64.

Note that this is not necessarily a universal performance win because of
how it is implemented. High-order pages can cause pcp->high to be exceeded
prematurely for lower-orders so for example, a large number of THP pages
being freed could release order-0 pages from the PCP lists. Hence, much
depends on the allocation/free pattern as observed by a single CPU to
determine if caching helps or hurts a particular workload.

That said, basic performance testing passed. The following is a netperf
UDP_STREAM test which hits the relevant patches as some of the network
allocations are high-order.
This series[1] looks very interesting!  I confirm that some network
allocations do use high-order allocations.  Thus, I think this will
increase network performance in general, like you confirm below:
netperf-udp
                                 5.13.0-rc2             5.13.0-rc2
                           mm-pcpburst-v3r4   mm-pcphighorder-v1r7
Hmean     send-64         261.46 (   0.00%)      266.30 *   1.85%*
Hmean     send-128        516.35 (   0.00%)      536.78 *   3.96%*
Hmean     send-256       1014.13 (   0.00%)     1034.63 *   2.02%*
Hmean     send-1024      3907.65 (   0.00%)     4046.11 *   3.54%*
Hmean     send-2048      7492.93 (   0.00%)     7754.85 *   3.50%*
Hmean     send-3312     11410.04 (   0.00%)    11772.32 *   3.18%*
Hmean     send-4096     13521.95 (   0.00%)    13912.34 *   2.89%*
Hmean     send-8192     21660.50 (   0.00%)    22730.72 *   4.94%*
Hmean     send-16384    31902.32 (   0.00%)    32637.50 *   2.30%*

From a functional point of view, a patch like this is necessary to
make bulk allocation of high-order pages work with similar performance
to order-0 bulk allocations. The bulk allocator is not updated in this
series as it would have to be determined by bulk allocation users how
they want to track the order of pages allocated with the bulk allocator.
Thanks for working on this Mel, it is great to see! :-)

Message-Id: [ref]
 [1] https://lore.kernel.org/linux-mm/20210531120412.17411-3-mgorman@techsingularity.net/ (local)
-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help