Thread (22 messages) 22 messages, 7 authors, 2007-10-17

Re: [PATCH 6/6] Use one zonelist that is filtered by nodemask

From: Nishanth Aravamudan <hidden>
Date: 2007-10-09 03:17:42
Also in: lkml

On 08.10.2007 [18:56:05 -0700], Christoph Lameter wrote:
On Mon, 8 Oct 2007, Nishanth Aravamudan wrote:
quoted
quoted
 struct page * fastcall
 __alloc_pages(gfp_t gfp_mask, unsigned int order,
 		struct zonelist *zonelist)
 {
+	/*
+	 * Use a temporary nodemask for __GFP_THISNODE allocations. If the
+	 * cost of allocating on the stack or the stack usage becomes
+	 * noticable, allocate the nodemasks per node at boot or compile time
+	 */
+	if (unlikely(gfp_mask & __GFP_THISNODE)) {
+		nodemask_t nodemask;
+
+		return __alloc_pages_internal(gfp_mask, order,
+				zonelist, nodemask_thisnode(&nodemask));
+	}
+
 	return __alloc_pages_internal(gfp_mask, order, zonelist, NULL);
 }
<snip>

So alloc_pages_node() calls here and for THISNODE allocations, we go ask
nodemask_thisnode() for a nodemask...
Hmmmm... nodemask_thisnode needs to be passed the zonelist.
quoted
And nodemask_thisnode() always gives us a nodemask with only the node
the current process is running on set, I think?
Right.

quoted
That seems really wrong -- and would explain what Lee was seeing while
using my patches for the hugetlb pool allocator to use THISNODE
allocations. All the allocations would end up coming from whatever node
the process happened to be running on. This obviously messes up hugetlb
accounting, as I rely on THISNODE requests returning NULL if they go
off-node.

I'm not sure how this would be fixed, as __alloc_pages() no longer has
the nid to set in the mask.

Am I wrong in my analysis?
No you are right on target. The thisnode function must determine the
node from the first zone of the zonelist.
It seems like I would zonelist_node_idx() for this, along the lines of:

	static nodemask_t *nodemask_thisnode(nodemask_t *nodemask,
		struct zonelist *zonelist)
	{
		int nid = zonelist_node_idx(zonelist);
		/* Build a nodemask for just this node */
		nodes_clear(*nodemask);
		node_set(nid, *nodemask);

		return nodemask;
	}

But I think I need to check that zonelist->_zonerefs->zone is !NULL, given this
definition of zonelist_node_idx()

	static inline int zonelist_node_idx(struct zoneref *zoneref)
	{
	#ifdef CONFIG_NUMA
		/* zone_to_nid not available in this context */
		return zoneref->zone->node;
	#else
		return 0;
	#endif /* CONFIG_NUMA */
	}

and this comment in __alloc_pages_internal():

	....
	z = zonelist->_zonerefs;  /* the list of zones suitable for gfp_mask */

	if (unlikely(!z->zone)) {
		/*
		 * Happens if we have an empty zonelist as a result of
		 * GFP_THISNODE being used on a memoryless node
		 */
		return NULL;
	}
	...

It seems like zoneref->zone may be NULL in zonelist_node_idx()? Maybe
someone else should look into resolving this :)

Thanks,
Nish

-- 
Nishanth Aravamudan [off-list ref]
IBM Linux Technology Center

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help