Re: Regression in mobility grouping?
From: Johannes Weiner <hannes@cmpxchg.org>
Date: 2016-09-28 15:39:44
Also in:
lkml
Hi Vlastimil, On Wed, Sep 28, 2016 at 11:00:15AM +0200, Vlastimil Babka wrote:
On 09/28/2016 03:41 AM, Johannes Weiner wrote:quoted
Hi guys, we noticed what looks like a regression in page mobility grouping during an upgrade from 3.10 to 4.0. Identical machines, workloads, and uptime, but /proc/pagetypeinfo on 3.10 looks like this: Number of blocks type Unmovable Reclaimable Movable Reserve Isolate Node 1, zone Normal 815 433 31518 2 0 and on 4.0 like this: Number of blocks type Unmovable Reclaimable Movable Reserve CMA Isolate Node 1, zone Normal 3880 3530 25356 2 0 0It's worth to keep in mind that this doesn't reflect where the actual unmovable pages reside. It might be that in 3.10 they are spread within the movable pages. IIRC enabling page_owner (not sure if in 4.0, there were some later fixes I think) can augment pagetypeinfo with at least some statistics of polluted pageblocks.
Thanks, I'll look at the mixed block counts. I failed to make clear, we saw that issue in the switch from 3.10 to 4.0, and I mentioned those two kernels as last known good / first known bad. But later kernels - we tried with 4.6 - look the same. This appears to be a regression in (higher-order) allocation service quality somewhere after 3.10 that persists into current kernels.
Does e.g. /proc/meminfo suggest how much unmovable/reclaimable memory there should be allocated and if it would fill the respective pageblocks, or if they are poorly utilized?
They are very poorly utilized. On a machine with 90% anon/cache pages alone we saw 50% of the page blocks unmovable.
quoted
4.0 is either polluting pageblocks more aggressively at allocation, or is not able to make pageblocks movable again when the reclaimable and unmovable allocations are released. Invoking compaction manually (/proc/sys/vm/compact_memory) is not bringing them back, either. The problem we are debugging is that these machines have a very high rate of order-3 allocations (fdtable during fork, network rx), and after the upgrade allocstalls have increased dramatically. I'm not entirely sure this is the same issue, since even order-0 allocations are struggling, but the mobility grouping in itself looks problematic. I'm still going through the changes relevant to mobility grouping in that timeframe, but if this rings a bell for anyone, it would help. I hate blaming random patches, but these caught my eye: 9c0415e mm: more aggressive page stealing for UNMOVABLE allocations 3a1086f mm: always steal split buddies in fallback allocations 99592d5 mm: when stealing freepages, also take pages created by splitting buddy pageCheck also the changelogs for mentions of earlier commits, e.g. 99592d5 should be restoring behavior that changed in 3.12-3.13 and you are upgrading from 3.10.
Good point.
quoted
The changelog states that by aggressively stealing split buddy pages during a fallback allocation we avoid subsequent stealing. But since there are generally more movable/reclaimable pages available, and so less falling back and stealing freepages on behalf of movable, won't this mean that we could expect exactly that result - growing numbers of unmovable blocks, while rarely stealing them back in movable alloc fallbacks? And the expansion of !MOVABLE blocks would over time make compaction less and less effective too, seeing as it doesn't consider anything !MOVABLE suitable migration targets?Yeah this is an issue with compaction that was brought up recently and I want to tackle next.
Agreed, it would be nice if compaction could reclaim unmovable and reclaimable blocks whose polluting allocations have since been freed. But there is a limit to how lazy mobility grouping can be and still expect compaction to fix it up. If 50% of the page blocks are marked unmovable, we don't pack incoming polluting allocations. When spread out the right way, even just a few of those can have a devastating impact on overall compactability. So regardless of future compaction improvements, we need to get anti-frag accuracy in the allocator closer to 3.10 levels again.
quoted
Attached are the full /proc/pagetypeinfo and /proc/buddyinfo from both kernels on machines with similar uptimes and directly after invoking compaction. As you can see, the buddy lists are much more fragmented on 4.0, with unmovable/reclaimable allocations polluting more blocks. Any thoughts on this would be greatly appreciated. I can test patches.I guess testing revert of 9c0415e could give us some idea. Commit 3a1086f shouldn't result in pageblock marking differences and as I said above, 99592d5 should be just restoring to what 3.10 did.
I can give this a shot, but note that this commit makes only unmovable stealing more aggressive. We see reclaimable blocks up as well. The workload is fairly variable, so it'll take about a day to smooth out a meaningful average. Thanks for your insights, Vlastimil! -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>