Re: [PATCH v5 2/3] mm/mempolicy: add set_mempolicy_home_node syscall
From: Aneesh Kumar K.V <hidden>
Date: 2021-11-29 13:47:45
Also in:
linux-mm
Michal Hocko [off-list ref] writes:
On Mon 29-11-21 16:16:05, Aneesh Kumar K.V wrote:quoted
Michal Hocko [off-list ref] writes:quoted
On Tue 16-11-21 12:12:37, Aneesh Kumar K.V wrote:[...]quoted
quoted
quoted
+SYSCALL_DEFINE4(set_mempolicy_home_node, unsigned long, start, unsigned long, len, + unsigned long, home_node, unsigned long, flags) +{ + struct mm_struct *mm = current->mm; + struct vm_area_struct *vma; + struct mempolicy *new; + unsigned long vmstart; + unsigned long vmend; + unsigned long end; + int err = -ENOENT; + + if (start & ~PAGE_MASK) + return -EINVAL; + /* + * flags is used for future extension if any. + */ + if (flags != 0) + return -EINVAL; + + if (!node_online(home_node)) + return -EINVAL;You really want to check the home_node before dereferencing the mask.Any reason why we want to check for home node first?Because the given node is an index to node_states[N_ONLINE] bitmap. I do not think we do range checking there.
Will add this if (home_node >= MAX_NUMNODES || !node_online(home_node)) return -EINVAL;
quoted
quoted
quoted
+ len = (len + PAGE_SIZE - 1) & PAGE_MASK; + end = start + len; + + if (end < start) + return -EINVAL; + if (end == start) + return 0; + mmap_write_lock(mm); + vma = find_vma(mm, start); + for (; vma && vma->vm_start < end; vma = vma->vm_next) { + + vmstart = max(start, vma->vm_start); + vmend = min(end, vma->vm_end); + new = mpol_dup(vma_policy(vma)); + if (IS_ERR(new)) { + err = PTR_ERR(new); + break; + } + /* + * Only update home node if there is an existing vma policy + */ + if (!new) + continue;Your changelog only mentions MPOL_BIND and MPOL_PREFERRED_MANY as supported but you seem to be applying the home node to all existing policiesoThe restriction is done in policy_node.@@ -1801,6 +1856,11 @@ static int policy_node(gfp_t gfp, struct mempolicy *policy, int nd)WARN_ON_ONCE(policy->mode == MPOL_BIND && (gfp & __GFP_THISNODE)); } + if ((policy->mode == MPOL_BIND || + policy->mode == MPOL_PREFERRED_MANY) && + policy->home_node != NUMA_NO_NODE) + return policy->home_node; + return nd; }But you do allow to set the home node also for other policies and that means that a default MPOL_INTERLEAVE would be different from the one with home_node set up even though they behave exactly the same.
I agree that there is no error returned if we try to set the home_node for other memory policies. But there should not be any behaviour differences. We ignore home_node for policies other than BIND and PREFERRED_MANY. The reason I avoided erroring out for other policies was to simplify the error handling. For example, for a range of addr with a mix of memory policy MPOLD_BIND and MPOL_INTERLEAVE what should be the state after the above system call? We could say, we ignore setting home_node for vma with policy MPOL_INTERLEAVE and leave the home node set for vma with policy MPOL_BIND. Or should we make the system call return error also undoing the changes done for vmas for which we have set the home_node? -aneesh