Gregory Price [off-list ref] writes:
On Wed, Jan 31, 2024 at 02:43:12PM +0800, Huang, Ying wrote:
quoted
Gregory Price [off-list ref] writes:
quoted
+static unsigned int weighted_interleave_nodes(struct mempolicy *policy)
+{
+ unsigned int node = current->il_prev;
+
+ if (!current->il_weight || !node_isset(node, policy->nodes)) {
+ node = next_node_in(node, policy->nodes);
+ /* can only happen if nodemask is being rebound */
+ if (node == MAX_NUMNODES)
+ return node;
I feel a little unsafe to read policy->nodes at same time of writing in
rebound. Is it better to use a seqlock to guarantee its consistency?
It's unnecessary to be a part of this series though.
I think this is handled already? It is definitely an explicit race
condition that is documented elsewhere:
/*
* mpol_rebind_policy - Migrate a policy to a different set of nodes
*
* Per-vma policies are protected by mmap_lock. Allocations using per-task
* policies are protected by task->mems_allowed_seq to prevent a premature
* OOM/allocation failure due to parallel nodemask modification.
*/
Thanks for pointing this out!
If we use task->mems_allowed_seq reader side in
weighted_interleave_nodes() we can guarantee the consistency of
policy->nodes. That may be not deserved, because it's not a big deal to
allocate 1 page in a wrong node.
It makes more sense to do that in
alloc_pages_bulk_array_weighted_interleave(), because a lot of pages may
be allocated there.
example from slub:
do {
cpuset_mems_cookie = read_mems_allowed_begin();
zonelist = node_zonelist(mempolicy_slab_node(), pc->flags);
...
} while (read_mems_allowed_retry(cpuset_mems_cookie));
quick perusal through other allocators, show similar checks.
page_alloc.c - check_retry_cpusetset()
filemap.c - filemap_alloc_folio()
If we ever want mempolicy to be swappable from outside the current task
context, this will have to change most likely - but that's another
feature for another day.
--
Best Regards,
Huang, Ying