Re: [v3 PATCH 2/3] mm/mempolicy: don't handle MPOL_LOCAL like a fake MPOL_PREFERRED policy
From: Michal Hocko <mhocko@suse.com>
Date: 2021-06-01 08:44:44
Also in:
lkml
On Mon 31-05-21 22:05:55, Feng Tang wrote:
MPOL_LOCAL policy has been setup as a real policy, but it is still handled like a faked POL_PREFERRED policy with one internal MPOL_F_LOCAL flag bit set, and there are many places having to judge the real 'prefer' or the 'local' policy, which are quite confusing. In current code, there are 4 cases that MPOL_LOCAL are used: 1. user specifies 'local' policy 2. user specifies 'prefer' policy, but with empty nodemask 3. system 'default' policy is used 4. 'prefer' policy + valid 'preferred' node with MPOL_F_STATIC_NODES flag set, and when it is 'rebind' to a nodemask which doesn't contains the 'preferred' node, it will perform as 'local' policy So make 'local' a real policy instead of a fake 'prefer' one, and kill MPOL_F_LOCAL bit, which can greatly reduce the confusion for code reading. For case 4, the logic of mpol_rebind_preferred() is confusing, as Michal Hocko pointed out: " I do believe that rebinding preferred policy is just bogus and it should be dropped altogether on the ground that a preference is a mere hint from userspace where to start the allocation. Unless I am missing something cpusets will be always authoritative for the final placement. The preferred node just acts as a starting point and it should be really preserved when cpusets changes. Otherwise we have a very subtle behavior corner cases. " So dump all the tricky transformation between 'prefer' and 'local', and just record the new nodemask of rebinding. Suggested-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Feng Tang <redacted>
I like this very much! It simplifies a tricky code and also a very dubious behavior. I would like to hear from others whether there might be some userspace depending on this obscure behavior though. One never knows... Some more notes/questions below [...]
quoted hunk ↗ jump to hunk
@@ -239,25 +240,19 @@ static int mpol_set_nodemask(struct mempolicy *pol, cpuset_current_mems_allowed, node_states[N_MEMORY]); VM_BUG_ON(!nodes); - if (pol->mode == MPOL_PREFERRED && nodes_empty(*nodes)) - nodes = NULL; /* explicit local allocation */ - else { - if (pol->flags & MPOL_F_RELATIVE_NODES) - mpol_relative_nodemask(&nsc->mask2, nodes, &nsc->mask1); - else - nodes_and(nsc->mask2, *nodes, nsc->mask1); - if (mpol_store_user_nodemask(pol)) - pol->w.user_nodemask = *nodes; - else - pol->w.cpuset_mems_allowed = - cpuset_current_mems_allowed; - } + if (pol->flags & MPOL_F_RELATIVE_NODES) + mpol_relative_nodemask(&nsc->mask2, nodes, &nsc->mask1); + else + nodes_and(nsc->mask2, *nodes, nsc->mask1);
Maybe I've just got lost here but why don't you need to check for the local policy anymore? mpol_new will take care of the MPOL_PREFERRED && nodes_empty special but why do we want/need all this for a local policy at all?
- if (nodes) - ret = mpol_ops[pol->mode].create(pol, &nsc->mask2); + if (mpol_store_user_nodemask(pol)) + pol->w.user_nodemask = *nodes; else - ret = mpol_ops[pol->mode].create(pol, NULL); + pol->w.cpuset_mems_allowed = + cpuset_current_mems_allowed;
please use a single line. This is just harder to read. You will cross the line limit but readability should be preferred here. [...] I haven't spotted anything else. -- Michal Hocko SUSE Labs