Thread (24 messages) 24 messages, 7 authors, 2016-09-26

Re: [PATCH 0/1] memory offline issues with hugepage size > memory block size

From: Michal Hocko <mhocko@kernel.org>
Date: 2016-09-21 18:20:59
Also in: lkml

On Tue 20-09-16 10:37:04, Mike Kravetz wrote:
On 09/20/2016 08:53 AM, Gerald Schaefer wrote:
quoted
dissolve_free_huge_pages() will either run into the VM_BUG_ON() or a
list corruption and addressing exception when trying to set a memory
block offline that is part (but not the first part) of a gigantic
hugetlb page with a size > memory block size.

When no other smaller hugepage sizes are present, the VM_BUG_ON() will
trigger directly. In the other case we will run into an addressing
exception later, because dissolve_free_huge_page() will not use the head
page of the compound hugetlb page which will result in a NULL hstate
from page_hstate(). list_del() would also not work well on a tail page.

To fix this, first remove the VM_BUG_ON() because it is wrong, and then
use the compound head page in dissolve_free_huge_page().

However, this all assumes that it is the desired behaviour to remove
a (gigantic) unused hugetlb page from the pool, just because a small
(in relation to the  hugepage size) memory block is going offline. Not
sure if this is the right thing, and it doesn't look very consistent
given that in this scenario it is _not_ possible to migrate
such a (gigantic) hugepage if it is in use. OTOH, has_unmovable_pages()
will return false in both cases, i.e. the memory block will be reported
as removable, no matter if the hugepage that it is part of is unused or
in use.

This patch is assuming that it would be OK to remove the hugepage,
i.e. memory offline beats pre-allocated unused (gigantic) hugepages.

Any thoughts?
Cc'ed Rui Teng and Dave Hansen as they were discussing the issue in
this thread:
https://lkml.org/lkml/2016/9/13/146

Their approach (I believe) would be to fail the offline operation in
this case.  However, I could argue that failing the operation, or
dissolving the unused huge page containing the area to be offlined is
the right thing to do.
I am sorry I have noticed this thread only now. I was arguing about this
in the original thread. I would be rather reluctant to free gigantic
page just because somebody wants to offline a small part of it because
setup is really expensive and a lost page would be really hard to get
back.

I would even question the per page block offlining itself. Why would
anybody want to offline few blocks rather than the whole node? What is
the usecase here?
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help