Re: [PATCH 2/2] mm/page_alloc: Allow high-order pages to be stored on the per-cpu lists
From: Jesper Dangaard Brouer <hidden>
Date: 2021-05-31 16:59:52
Also in:
lkml, netdev
On Mon, 31 May 2021 13:04:12 +0100 Mel Gorman [off-list ref] wrote:
The per-cpu page allocator (PCP) only stores order-0 pages. This means that all THP and "cheap" high-order allocations including SLUB contends on the zone->lock. This patch extends the PCP allocator to store THP and "cheap" high-order pages. Note that struct per_cpu_pages increases in size to 256 bytes (4 cache lines) on x86-64. Note that this is not necessarily a universal performance win because of how it is implemented. High-order pages can cause pcp->high to be exceeded prematurely for lower-orders so for example, a large number of THP pages being freed could release order-0 pages from the PCP lists. Hence, much depends on the allocation/free pattern as observed by a single CPU to determine if caching helps or hurts a particular workload. That said, basic performance testing passed. The following is a netperf UDP_STREAM test which hits the relevant patches as some of the network allocations are high-order.
This series[1] looks very interesting! I confirm that some network allocations do use high-order allocations. Thus, I think this will increase network performance in general, like you confirm below:
netperf-udp
5.13.0-rc2 5.13.0-rc2
mm-pcpburst-v3r4 mm-pcphighorder-v1r7
Hmean send-64 261.46 ( 0.00%) 266.30 * 1.85%*
Hmean send-128 516.35 ( 0.00%) 536.78 * 3.96%*
Hmean send-256 1014.13 ( 0.00%) 1034.63 * 2.02%*
Hmean send-1024 3907.65 ( 0.00%) 4046.11 * 3.54%*
Hmean send-2048 7492.93 ( 0.00%) 7754.85 * 3.50%*
Hmean send-3312 11410.04 ( 0.00%) 11772.32 * 3.18%*
Hmean send-4096 13521.95 ( 0.00%) 13912.34 * 2.89%*
Hmean send-8192 21660.50 ( 0.00%) 22730.72 * 4.94%*
Hmean send-16384 31902.32 ( 0.00%) 32637.50 * 2.30%*
From a functional point of view, a patch like this is necessary to
make bulk allocation of high-order pages work with similar performance
to order-0 bulk allocations. The bulk allocator is not updated in this
series as it would have to be determined by bulk allocation users how
they want to track the order of pages allocated with the bulk allocator.Thanks for working on this Mel, it is great to see! :-) Message-Id: [ref] [1] https://lore.kernel.org/linux-mm/20210531120412.17411-3-mgorman@techsingularity.net/ (local) -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer