Re: [PATCH 0/1] memory offline issues with hugepage size > memory block size
From: Michal Hocko <mhocko@kernel.org>
Date: 2016-09-21 18:20:59
Also in:
lkml
On Tue 20-09-16 10:37:04, Mike Kravetz wrote:
On 09/20/2016 08:53 AM, Gerald Schaefer wrote:quoted
dissolve_free_huge_pages() will either run into the VM_BUG_ON() or a list corruption and addressing exception when trying to set a memory block offline that is part (but not the first part) of a gigantic hugetlb page with a size > memory block size. When no other smaller hugepage sizes are present, the VM_BUG_ON() will trigger directly. In the other case we will run into an addressing exception later, because dissolve_free_huge_page() will not use the head page of the compound hugetlb page which will result in a NULL hstate from page_hstate(). list_del() would also not work well on a tail page. To fix this, first remove the VM_BUG_ON() because it is wrong, and then use the compound head page in dissolve_free_huge_page(). However, this all assumes that it is the desired behaviour to remove a (gigantic) unused hugetlb page from the pool, just because a small (in relation to the hugepage size) memory block is going offline. Not sure if this is the right thing, and it doesn't look very consistent given that in this scenario it is _not_ possible to migrate such a (gigantic) hugepage if it is in use. OTOH, has_unmovable_pages() will return false in both cases, i.e. the memory block will be reported as removable, no matter if the hugepage that it is part of is unused or in use. This patch is assuming that it would be OK to remove the hugepage, i.e. memory offline beats pre-allocated unused (gigantic) hugepages. Any thoughts?Cc'ed Rui Teng and Dave Hansen as they were discussing the issue in this thread: https://lkml.org/lkml/2016/9/13/146 Their approach (I believe) would be to fail the offline operation in this case. However, I could argue that failing the operation, or dissolving the unused huge page containing the area to be offlined is the right thing to do.
I am sorry I have noticed this thread only now. I was arguing about this in the original thread. I would be rather reluctant to free gigantic page just because somebody wants to offline a small part of it because setup is really expensive and a lost page would be really hard to get back. I would even question the per page block offlining itself. Why would anybody want to offline few blocks rather than the whole node? What is the usecase here? -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>