Re: [PATCH RESEND 0/8] hugetlb: add demote/split page functionality
From: Mike Kravetz <hidden>
Date: 2021-09-08 21:00:41
Also in:
lkml
On 9/7/21 1:50 AM, Hillf Danton wrote:
On Mon, 6 Sep 2021 16:40:28 +0200 Vlastimil Babka wrote:quoted
On 9/2/21 20:17, Mike Kravetz wrote:quoted
Here is some very high level information from a long stall that was interrupted. This was an order 9 allocation from alloc_buddy_huge_page(). 55269.530564] __alloc_pages_slowpath: jiffies 47329325 tries 609673 cpu_tries 1 node 0 FAIL [55269.539893] r_tries 25 c_tries 609647 reclaim 47325161 compact 607 Yes, in __alloc_pages_slowpath for 47329325 jiffies before being interrupted. should_reclaim_retry returned true 25 times and should_compact_retry returned true 609647 times. Almost all time (47325161 jiffies) spent in __alloc_pages_direct_reclaim, and 607 jiffies spent in __alloc_pages_direct_compact. Looks like both reclaim retries > MAX_RECLAIM_RETRIES and compaction retries > MAX_COMPACT_RETRIESYeah AFAICS that's only possible with the scenario I suspected. I guess we should put a limit on compact retries (maybe some multiple of MAX_COMPACT_RETRIES) even if it thinks that reclaim could help, while clearly it doesn't (i.e. because somebody else is stealing the page like in your test case).And/or clamp reclaim retries for costly orders reclaim retries = MAX_RECLAIM_RETRIES - order; to pull down the chance for stall as low as possible.
Thanks, and sorry for not replying quickly. I only get back to this as time allows. We could clamp the number of compaction and reclaim retries in __alloc_pages_slowpath as suggested. However, I noticed that a single reclaim call could take a bunch of time. As a result, I instrumented shrink_node to see what might be happening. Here is some information from a long stall. Note that I only dump stats when jiffies > 100000. [ 8136.874706] shrink_node: 507654 total jiffies, 3557110 tries [ 8136.881130] 130596341 reclaimed, 32 nr_to_reclaim [ 8136.887643] compaction_suitable results: [ 8136.893276] idx COMPACT_SKIPPED, 3557109 [ 8672.399839] shrink_node: 522076 total jiffies, 3466228 tries [ 8672.406268] 124427720 reclaimed, 32 nr_to_reclaim [ 8672.412782] compaction_suitable results: [ 8672.418421] idx COMPACT_SKIPPED, 3466227 [ 8908.099592] __alloc_pages_slowpath: jiffies 2939938 tries 17068 cpu_tries 1 node 0 success [ 8908.109120] r_tries 11 c_tries 17056 reclaim 2939865 compact 9 In this case, clamping the number of retries from should_compact_retry and should_reclaim_retry could help. Mostly because we will not be calling back into the reclaim code? Notice the long amount of time spent in shrink_node. The 'tries' in shrink_node come about from that: if (should_continue_reclaim(pgdat, sc->nr_reclaimed - nr_reclaimed, sc)) goto again; compaction_suitable results is the values returned from calls to should_continue_reclaim -> compaction_suitable. Trying to think if there might be an intelligent way to quit early. -- Mike Kravetz