Re: [PATCH 6/6] Use one zonelist that is filtered by nodemask
From: Nishanth Aravamudan <hidden>
Date: 2007-10-09 03:17:42
Also in:
lkml
On 08.10.2007 [18:56:05 -0700], Christoph Lameter wrote:
On Mon, 8 Oct 2007, Nishanth Aravamudan wrote:quoted
quoted
struct page * fastcall __alloc_pages(gfp_t gfp_mask, unsigned int order, struct zonelist *zonelist) { + /* + * Use a temporary nodemask for __GFP_THISNODE allocations. If the + * cost of allocating on the stack or the stack usage becomes + * noticable, allocate the nodemasks per node at boot or compile time + */ + if (unlikely(gfp_mask & __GFP_THISNODE)) { + nodemask_t nodemask; + + return __alloc_pages_internal(gfp_mask, order, + zonelist, nodemask_thisnode(&nodemask)); + } + return __alloc_pages_internal(gfp_mask, order, zonelist, NULL); }<snip> So alloc_pages_node() calls here and for THISNODE allocations, we go ask nodemask_thisnode() for a nodemask...Hmmmm... nodemask_thisnode needs to be passed the zonelist.quoted
And nodemask_thisnode() always gives us a nodemask with only the node the current process is running on set, I think?Right.quoted
That seems really wrong -- and would explain what Lee was seeing while using my patches for the hugetlb pool allocator to use THISNODE allocations. All the allocations would end up coming from whatever node the process happened to be running on. This obviously messes up hugetlb accounting, as I rely on THISNODE requests returning NULL if they go off-node. I'm not sure how this would be fixed, as __alloc_pages() no longer has the nid to set in the mask. Am I wrong in my analysis?No you are right on target. The thisnode function must determine the node from the first zone of the zonelist.
It seems like I would zonelist_node_idx() for this, along the lines of:
static nodemask_t *nodemask_thisnode(nodemask_t *nodemask,
struct zonelist *zonelist)
{
int nid = zonelist_node_idx(zonelist);
/* Build a nodemask for just this node */
nodes_clear(*nodemask);
node_set(nid, *nodemask);
return nodemask;
}
But I think I need to check that zonelist->_zonerefs->zone is !NULL, given this
definition of zonelist_node_idx()
static inline int zonelist_node_idx(struct zoneref *zoneref)
{
#ifdef CONFIG_NUMA
/* zone_to_nid not available in this context */
return zoneref->zone->node;
#else
return 0;
#endif /* CONFIG_NUMA */
}
and this comment in __alloc_pages_internal():
....
z = zonelist->_zonerefs; /* the list of zones suitable for gfp_mask */
if (unlikely(!z->zone)) {
/*
* Happens if we have an empty zonelist as a result of
* GFP_THISNODE being used on a memoryless node
*/
return NULL;
}
...
It seems like zoneref->zone may be NULL in zonelist_node_idx()? Maybe
someone else should look into resolving this :)
Thanks,
Nish
--
Nishanth Aravamudan [off-list ref]
IBM Linux Technology Center
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>