Thread (12 messages) 12 messages, 6 authors, 2018-08-23

Re: Infinite looping observed in __offline_pages

From: Michal Hocko <mhocko@kernel.org>
Date: 2018-07-30 09:16:10
Also in: lkml

On Fri 27-07-18 12:32:59, John Allen wrote:
On Wed, Jul 25, 2018 at 10:03:36PM +0200, Michal Hocko wrote:
quoted
On Wed 25-07-18 13:11:15, John Allen wrote:
[...]
quoted
Does a failure in do_migrate_range indicate that the range is unmigratable
and the loop in __offline_pages should terminate and goto failed_removal? Or
should we allow a certain number of retrys before we
give up on migrating the range?
Unfortunatelly not. Migration code doesn't tell a difference between
ephemeral and permanent failures. We are relying on
start_isolate_page_range to tell us this. So the question is, what kind
of page is not migratable and for what reason.

Are you able to add some debugging to give us more information. The
current debugging code in the hotplug/migration sucks...
After reproducing the problem a couple times, it seems that it can occur for
different types of pages. Running page-types on the offending page over two
separate instances produced the following:

# tools/vm/page-types -a 307968-308224
            flags	page-count       MB  symbolic-flags			long-symbolic-flags
0x0000000000000400	         1        0  __________B________________________________	buddy
	     total	         1        0
Huh! How come a buddy page has non zero reference count.
And the following on a separate run:

# tools/vm/page-types -a 313088-313344
            flags	page-count       MB  symbolic-flags			long-symbolic-flags
0x000000000000006c	         1        0  __RU_lA____________________________________	referenced,uptodate,lru,active
            total	         1        0
Hmm, what is the expected page count in this case? Seeing 1 doesn't look
particularly wrong.
-- 
Michal Hocko
SUSE Labs
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help