Thread (12 messages) 12 messages, 6 authors, 2018-08-23

Re: Infinite looping observed in __offline_pages

From: Michal Hocko <mhocko@kernel.org>
Date: 2018-08-01 11:20:45
Also in: lkml

On Wed 01-08-18 21:09:39, Michael Ellerman wrote:
Michal Hocko [off-list ref] writes:
quoted
On Wed 25-07-18 13:11:15, John Allen wrote:
[...]
quoted
Does a failure in do_migrate_range indicate that the range is unmigratable
and the loop in __offline_pages should terminate and goto failed_removal? Or
should we allow a certain number of retrys before we
give up on migrating the range?
Unfortunatelly not. Migration code doesn't tell a difference between
ephemeral and permanent failures.
What's to stop an ephemeral failure happening repeatedly?
If there is a short term pin on the page that prevents the migration
then the holder of the pin should realease it and the next retry will
succeed the migration. If the page gets freed on the way then it will
not be reallocated because they are isolated already. I can only see
complete OOM to be the reason to fail allocation of the target place
as the migration failure and that is highly unlikely and sooner or later
trigger the oom killer and release some memory.

The biggest problem here is that we cannot tell ephemeral and long term
pins...
-- 
Michal Hocko
SUSE Labs
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help