Re: [PATCH RESEND 0/8] hugetlb: add demote/split page functionality
From: Mike Kravetz <hidden>
Date: 2021-09-02 18:17:39
Also in:
lkml
On 8/30/21 3:11 AM, Vlastimil Babka wrote:
On 8/28/21 01:04, Mike Kravetz wrote:quoted
On 8/27/21 10:22 AM, Vlastimil Babka wrote: I 'may' have been over stressing the system with all CPUs doing file reads to fill the page cache with clean pages. I certainly need to spend some more debug/analysis time on this.Hm that *could* play a role, as these will allow reclaim to make progress, but also the reclaimed pages might be stolen immediately and compaction will return COMPACT_SKIPPED and in should_compact_retry() we might go through this code path: /* * compaction was skipped because there are not enough order-0 pages * to work with, so we retry only if it looks like reclaim can help. */ if (compaction_needs_reclaim(compact_result)) { ret = compaction_zonelist_suitable(ac, order, alloc_flags); goto out; } where compaction_zonelist_suitable() will return true because it appears reclaim can free pages to allow progress. And there are no max retries applied for this case. With the reclaim and compaction tracepoints it should be possible to confirm this scenario.
Here is some very high level information from a long stall that was interrupted. This was an order 9 allocation from alloc_buddy_huge_page(). 55269.530564] __alloc_pages_slowpath: jiffies 47329325 tries 609673 cpu_tries 1 node 0 FAIL [55269.539893] r_tries 25 c_tries 609647 reclaim 47325161 compact 607 Yes, in __alloc_pages_slowpath for 47329325 jiffies before being interrupted. should_reclaim_retry returned true 25 times and should_compact_retry returned true 609647 times. Almost all time (47325161 jiffies) spent in __alloc_pages_direct_reclaim, and 607 jiffies spent in __alloc_pages_direct_compact. Looks like both reclaim retries > MAX_RECLAIM_RETRIES and compaction retries > MAX_COMPACT_RETRIES -- Mike Kravetz