Thread (3 messages) 3 messages, 2 authors, 2024-01-31

Re: [PATCH v4 3/3] mm/mempolicy: introduce MPOL_WEIGHTED_INTERLEAVE for weighted interleaving

From: Huang, Ying <hidden>
Date: 2024-01-31 09:22:06
Also in: linux-doc, linux-fsdevel, linux-mm, lkml

Possibly related (same subject, not in this thread)

Gregory Price [off-list ref] writes:
On Wed, Jan 31, 2024 at 02:43:12PM +0800, Huang, Ying wrote:
quoted
Gregory Price [off-list ref] writes:
quoted
 
+static unsigned int weighted_interleave_nodes(struct mempolicy *policy)
+{
+	unsigned int node = current->il_prev;
+
+	if (!current->il_weight || !node_isset(node, policy->nodes)) {
+		node = next_node_in(node, policy->nodes);
+		/* can only happen if nodemask is being rebound */
+		if (node == MAX_NUMNODES)
+			return node;
I feel a little unsafe to read policy->nodes at same time of writing in
rebound.  Is it better to use a seqlock to guarantee its consistency?
It's unnecessary to be a part of this series though.
I think this is handled already? It is definitely an explicit race
condition that is documented elsewhere:

/*
 * mpol_rebind_policy - Migrate a policy to a different set of nodes
 *
 * Per-vma policies are protected by mmap_lock. Allocations using per-task
 * policies are protected by task->mems_allowed_seq to prevent a premature
 * OOM/allocation failure due to parallel nodemask modification.
 */
Thanks for pointing this out!

If we use task->mems_allowed_seq reader side in
weighted_interleave_nodes() we can guarantee the consistency of
policy->nodes.  That may be not deserved, because it's not a big deal to
allocate 1 page in a wrong node.

It makes more sense to do that in
alloc_pages_bulk_array_weighted_interleave(), because a lot of pages may
be allocated there.
example from slub:

do {
	cpuset_mems_cookie = read_mems_allowed_begin();
	zonelist = node_zonelist(mempolicy_slab_node(), pc->flags);
	...
} while (read_mems_allowed_retry(cpuset_mems_cookie));

quick perusal through other allocators, show similar checks.

page_alloc.c  -  check_retry_cpusetset()
filemap.c     -  filemap_alloc_folio()

If we ever want mempolicy to be swappable from outside the current task
context, this will have to change most likely - but that's another
feature for another day.
--
Best Regards,
Huang, Ying
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help