Re: Infinite looping observed in __offline_pages
From: Michal Hocko <mhocko@kernel.org>
Date: 2018-07-30 09:16:10
Also in:
lkml
On Fri 27-07-18 12:32:59, John Allen wrote:
On Wed, Jul 25, 2018 at 10:03:36PM +0200, Michal Hocko wrote:quoted
On Wed 25-07-18 13:11:15, John Allen wrote: [...]quoted
Does a failure in do_migrate_range indicate that the range is unmigratable and the loop in __offline_pages should terminate and goto failed_removal? Or should we allow a certain number of retrys before we give up on migrating the range?Unfortunatelly not. Migration code doesn't tell a difference between ephemeral and permanent failures. We are relying on start_isolate_page_range to tell us this. So the question is, what kind of page is not migratable and for what reason. Are you able to add some debugging to give us more information. The current debugging code in the hotplug/migration sucks...After reproducing the problem a couple times, it seems that it can occur for different types of pages. Running page-types on the offending page over two separate instances produced the following: # tools/vm/page-types -a 307968-308224 flags page-count MB symbolic-flags long-symbolic-flags 0x0000000000000400 1 0 __________B________________________________ buddy total 1 0
Huh! How come a buddy page has non zero reference count.
And the following on a separate run:
# tools/vm/page-types -a 313088-313344
flags page-count MB symbolic-flags long-symbolic-flags
0x000000000000006c 1 0 __RU_lA____________________________________ referenced,uptodate,lru,active
total 1 0Hmm, what is the expected page count in this case? Seeing 1 doesn't look particularly wrong. -- Michal Hocko SUSE Labs