Re: [PATCH] mm: page_alloc: High-order per-cpu page allocator v3
From: Mel Gorman <hidden>
Date: 2016-11-30 14:06:26
Also in:
lkml
On Wed, Nov 30, 2016 at 01:40:34PM +0100, Jesper Dangaard Brouer wrote:
On Sun, 27 Nov 2016 13:19:54 +0000 Mel Gorman [off-list ref] wrote: [...]quoted
SLUB has been the default small kernel object allocator for quite some time but it is not universally used due to performance concerns and a reliance on high-order pages. The high-order concerns has two major components -- high-order pages are not always available and high-order page allocations potentially contend on the zone->lock. This patch addresses some concerns about the zone lock contention by extending the per-cpu page allocator to cache high-order pages. The patch makes the following modifications o New per-cpu lists are added to cache the high-order pages. This increases the cache footprint of the per-cpu allocator and overall usage but for some workloads, this will be offset by reduced contention on zone->lock.This will also help performance of NIC driver that allocator higher-order pages for their RX-ring queue (and chop it up for MTU). I do like this patch, even-though I'm working on moving drivers away from allocation these high-order pages. Acked-by: Jesper Dangaard Brouer <redacted>
Thanks.
[...]quoted
This is the result from netperf running UDP_STREAM on localhost. It was selected on the basis that it is slab-intensive and has been the subject of previous SLAB vs SLUB comparisons with the caveat that this is not testing between two physical hosts.I do like you are using a networking test to benchmark this. Looking at the results, my initial response is that the improvements are basically too good to be true.
FWIW, LKP independently measured the boost to be 23% so it's expected there will be different results depending on exact configuration and CPU.
Can you share how you tested this with netperf and the specific netperf parameters?
The mmtests config file used is configs/config-global-dhp__network-netperf-unbound so all details can be extrapolated or reproduced from that.
e.g. How do you configure the send/recv sizes?
Static range of sizes specified in the config file.
Have you pinned netperf and netserver on different CPUs?
No. While it's possible to do a pinned test which helps stability, it also tends to be less reflective of what happens in a variety of workloads so I took the "harder" option.
For localhost testing, when netperf and netserver run on the same CPU, you observer half the performance, very intuitively. When pinning netperf and netserver (via e.g. option -T 1,2) you observe the most stable results. When allowing netperf and netserver to migrate between CPUs (default setting), the real fun starts and unstable results, because now the CPU scheduler is also being tested, and my experience is also more "fun" memory situations occurs, as I guess we are hopping between more per CPU alloc caches (also affecting the SLUB per CPU usage pattern).
Yes which is another reason why I used an unbound configuration. I didn't want to get an artificial boost from pinned server/client using the same per-cpu caches. As a side-effect, it may mean that machines with fewer CPUs get a greater boost as there are fewer per-cpu caches being used.
quoted
2-socket modern machine 4.9.0-rc5 4.9.0-rc5 vanilla hopcpu-v3The kernel from 4.9.0-rc5-vanilla to 4.9.0-rc5-hopcpu-v3 only contains this single change right?
Yes. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>