Thread (24 messages) 24 messages, 2 authors, 2021-05-31

Re: [PATCH v1 4/4] mm/mempolicy: kill MPOL_F_LOCAL bit

From: Feng Tang <hidden>
Date: 2021-05-31 07:33:04
Also in: lkml

On Mon, May 31, 2021 at 09:00:25AM +0200, Michal Hocko wrote:
On Fri 28-05-21 12:39:54, Feng Tang wrote:
quoted
On Thu, May 27, 2021 at 05:34:56PM +0200, Michal Hocko wrote:
quoted
On Thu 27-05-21 21:34:36, Feng Tang wrote:
quoted
On Thu, May 27, 2021 at 02:26:24PM +0200, Michal Hocko wrote:
quoted
On Thu 27-05-21 20:10:41, Feng Tang wrote:
quoted
On Thu, May 27, 2021 at 10:20:08AM +0200, Michal Hocko wrote:
quoted
On Wed 26-05-21 13:01:42, Feng Tang wrote:
quoted
Now the only remaining case of a real 'local' policy faked by
'prefer' policy plus MPOL_F_LOCAL bit is:

A valid 'prefer' policy with a valid 'preferred' node is 'rebind'
to a nodemask which doesn't contains the 'preferred' node, then it
will handle allocation with 'local' policy.

Add a new 'MPOL_F_LOCAL_TEMP' bit for this case, and kill the
MPOL_F_LOCAL bit, which could simplify the code much.
As I've pointed out in the reply to the previous patch. It would have
been much better if most of the MPOL_F_LOCAL usage was gone by this
patch.

I also dislike a new MPOL_F_LOCAL_TEMP. This smells like sneaking the
hack back in after you have painstakingly removed it. So this looks like
a step backwards to me. I also do not understand why do we need the
rebind callback for local policy at all. There is no node mask for local
so what is going on here?
This is the special case 4 for 'perfer' policy with MPOL_F_STATIC_NODES
flag set, say it prefer node 1, when it is later 'refind' to a new
nodemask node 2-3, according to current code it will be add the
MPOL_F_LOCAL bit and performs 'local' policy acctually. And in future
it is 'rebind' again with a nodemask 1-2, it will be restored back
to 'prefer' policy with preferred node 1.
Honestly I still do not follow the actual problem. 
I was confused too, and don't know the original thought behind it. This
case 4 was just imagined by reading the code.
quoted
A preferred node is a
_hint_. If you rebind the task to a different cpuset then why should we
actually care? The allocator will fallback to the closest node according
to the distance metric. Maybe the original code was trying to handle
that in some way but I really do fail to understand that code and I
strongly suspect it is more likely to overengineered rather than backed
by a real usecase. I might be wrong here but then this is an excellent
opportunity to clarify all those subtleties.
From the code, the original special handling may be needed in 3 cases:
    get_policy_nodemask()
    policy_node()
    mempolicy_slab_node()
to not return the preset prefer_nid.
I am sorry but I do not follow. What is actually wrong if the preferred
node is outside of the cpuset nodemask?
Sorry, I didn't make it clear. With current code logic, it will perform
as 'local' policy, but its mode is kept as 'prefer', so the code still
has these tricky bit checking when these APIs are called for this policy.
I agree with you that these ping-pong rebind() may be over engineering,
so for this case can we just change the policy from 'prefer' to 'local',
and drop the tricky bit manipulation, as the 'prefer' is just a hint,
if these rebind misses the target node, there is no need to stick with
the 'prefer' policy?
Again. I really do not understand why we should rebind or mark as local
anything here. Is this a documented/expected behavior? What if somebody
just changes the cpuset to include the preferred node again. Is it
expected to have local preference now?
Good point! Marking 'local' doesn't solve the whole issue. And I didn't 
find any document defining the semantics.
I can see you have posted a newer version which I haven't seen yet but
this is really better to get resolved before building up more on top.
And let me be explicit. I do believe that rebinding preferred policy is
just bogus and it should be dropped altogether on the ground that a 
preference is a mere hint from userspace where to start the allocation. 
Yes, the current mpol_rebind_preferred()'s logic is confusing. Let me
try to understand it correctly, are you suggesting to do nothing for
'prefer's rebinding regarding MPOL_F_STATIC_NODES and MPOL_F_RELATIVE_NODES,
while just setting 'pol->w.cpuset_mems_allowed' to the new nodemask?

Thanks,
Feng
Unless I am missing something cpusets will be always authoritative for
the final placement. The preferred node just acts as a starting point
and it should be really preserved when cpusets changes. Otherwise we
have a very subtle behavior corner cases.
-- 
Michal Hocko
SUSE Labs
  
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help