Thread (14 messages) 14 messages, 2 authors, 2014-05-12

Re: Bug in reclaim logic with exhausted nodes?

From: Nishanth Aravamudan <hidden>
Date: 2014-03-27 20:34:16
Also in: linux-mm

Hi Christoph,

On 25.03.2014 [13:25:30 -0500], Christoph Lameter wrote:
On Tue, 25 Mar 2014, Nishanth Aravamudan wrote:
quoted
On power, very early, we find the 16G pages (gpages in the powerpc arch
code) in the device-tree:

early_setup ->
	early_init_mmu ->
		htab_initialize ->
			htab_init_page_sizes ->
				htab_dt_scan_hugepage_blocks ->
					memblock_reserve
						which marks the memory
						as reserved
					add_gpage
						which saves the address
						off so future calls for
						alloc_bootmem_huge_page()

hugetlb_init ->
		hugetlb_init_hstates ->
			hugetlb_hstate_alloc_pages ->
				alloc_bootmem_huge_page
quoted
Not sure if I understand that correctly.
Basically this is present memory that is "reserved" for the 16GB usage
per the LPAR configuration. We honor that configuration in Linux based
upon the contents of the device-tree. It just so happens in the
configuration from my original e-mail that a consequence of this is that
a NUMA node has memory (topologically), but none of that memory is free,
nor will it ever be free.
Well dont do that
quoted
Perhaps, in this case, we could just remove that node from the N_MEMORY
mask? Memory allocations will never succeed from the node, and we can
never free these 16GB pages. It is really not any different than a
memoryless node *except* when you are using the 16GB pages.
That looks to be the correct way to handle things. Maybe mark the node as
offline or somehow not present so that the kernel ignores it.
This is a SLUB condition:

mm/slub.c::early_kmem_cache_node_alloc():
...
        page = new_slab(kmem_cache_node, GFP_NOWAIT, node);
...
        if (page_to_nid(page) != node) {
                printk(KERN_ERR "SLUB: Unable to allocate memory from "
                                "node %d\n", node);
                printk(KERN_ERR "SLUB: Allocating a useless per node structure "
                                "in order to be able to continue\n");
        }
...

Since this is quite early, and we have not set up the nodemasks yet,
does it make sense to perhaps have a temporary init-time nodemask that
we set bits in here, and "fix-up" those nodes when we setup the
nodemasks?

Thanks,
Nish
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help