Thread (13 messages) 13 messages, 2 authors, 2016-01-29

Re: [PATCH 4/5] mm: workingset: eviction buckets for bigmem/lowbit machines

From: Vladimir Davydov <hidden>
Date: 2016-01-27 14:39:58
Also in: linux-mm, lkml

On Tue, Jan 26, 2016 at 04:00:05PM -0500, Johannes Weiner wrote:
For per-cgroup thrash detection, we need to store the memcg ID inside
the radix tree cookie as well. However, on 32 bit that doesn't leave
enough bits for the eviction timestamp to cover the necessary range of
recently evicted pages. The radix tree entry would look like this:

[ RADIX_TREE_EXCEPTIONAL(2) | ZONEID(2) | MEMCGID(16) | EVICTION(12) ]

12 bits means 4096 pages, means 16M worth of recently evicted pages.
But refaults are actionable up to distances covering half of memory.
To not miss refaults, we have to stretch out the range at the cost of
how precisely we can tell when a page was evicted. This way we can
shave off lower bits from the eviction timestamp until the necessary
range is covered. E.g. grouping evictions into 1M buckets (256 pages)
will stretch the longest representable refault distance to 4G.

This patch implements eviction buckets that are automatically sized
according to the available bits and the necessary refault range, in
preparation for per-cgroup thrash detection.

The maximum actionable distance is currently half of memory, but to
support memory hotplug of up to 200% of boot-time memory, we size the
buckets to cover double the distance. Beyond that, thrashing won't be
detectable anymore.

During boot, the kernel will print out the exact parameters, like so:

[    0.113929] workingset: timestamp_bits=12 max_order=18 bucket_order=6

In this example, there are 12 radix entry bits available for the
eviction timestamp, to cover a maximum distance of 2^18 pages (this is
a 1G machine). Consequently, evictions must be grouped into buckets of
2^6 pages, or 256K.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Vladimir Davydov <redacted>

One nit below.
+/*
+ * Eviction timestamps need to be able to cover the full range of
+ * actionable refaults. However, bits are tight in the radix tree
+ * entry, and after storing the identifier for the lruvec there might
+ * not be enough left to represent every single actionable refault. In
+ * that case, we have to sacrifice granularity for distance, and group
+ * evictions into coarser buckets by shaving off lower timestamp bits.
+ */
+static unsigned int bucket_order;
__read_mostly?

Thanks,
Vladimir
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help