Re: [RFC 1/6] mm, page_alloc: fix more premature OOM due to race with cpuset update

(off-list ancestor, not in this archive)
Re: [RFC 1/6] mm, page_alloc: fix more premature OOM due to race with cpuset update · Vlastimil Babka <hidden> · 2017-04-11
Re: [RFC 1/6] mm, page_alloc: fix more premature OOM due to race with cpuset update · Christoph Lameter <hidden> · 2017-04-12
Re: [RFC 1/6] mm, page_alloc: fix more premature OOM due to race with cpuset update · Vlastimil Babka <hidden> · 2017-04-13
Re: [RFC 1/6] mm, page_alloc: fix more premature OOM due to race with cpuset update · Christoph Lameter <hidden> · 2017-04-14
Re: [RFC 1/6] mm, page_alloc: fix more premature OOM due to race with cpuset update · Vlastimil Babka <hidden> · 2017-04-26
Re: [RFC 1/6] mm, page_alloc: fix more premature OOM due to race with cpuset update · Christoph Lameter <hidden> · 2017-04-30
Re: [RFC 1/6] mm, page_alloc: fix more premature OOM due to race with cpuset update · Michal Hocko <mhocko@kernel.org> · 2017-05-17
Re: [RFC 1/6] mm, page_alloc: fix more premature OOM due to race with cpuset update · Christoph Lameter <hidden> · 2017-05-17
Re: [RFC 1/6] mm, page_alloc: fix more premature OOM due to race with cpuset update · Michal Hocko <mhocko@kernel.org> · 2017-05-17
Re: [RFC 1/6] mm, page_alloc: fix more premature OOM due to race with cpuset update · Christoph Lameter <hidden> · 2017-05-17
Re: [RFC 1/6] mm, page_alloc: fix more premature OOM due to race with cpuset update · Michal Hocko <mhocko@kernel.org> · 2017-05-17
Re: [RFC 1/6] mm, page_alloc: fix more premature OOM due to race with cpuset update · Christoph Lameter <hidden> · 2017-05-17
Re: [RFC 1/6] mm, page_alloc: fix more premature OOM due to race with cpuset update · Michal Hocko <mhocko@kernel.org> · 2017-05-18
Re: [RFC 1/6] mm, page_alloc: fix more premature OOM due to race with cpuset update · Christoph Lameter <hidden> · 2017-05-18
Re: [RFC 1/6] mm, page_alloc: fix more premature OOM due to race with cpuset update · Michal Hocko <mhocko@kernel.org> · 2017-05-18
Re: [RFC 1/6] mm, page_alloc: fix more premature OOM due to race with cpuset update · Christoph Lameter <hidden> · 2017-05-18
Re: [RFC 1/6] mm, page_alloc: fix more premature OOM due to race with cpuset update · Michal Hocko <mhocko@kernel.org> · 2017-05-19
Re: [RFC 1/6] mm, page_alloc: fix more premature OOM due to race with cpuset update · Christoph Lameter <hidden> · 2017-05-17
Re: [RFC 1/6] mm, page_alloc: fix more premature OOM due to race with cpuset update · Vlastimil Babka <hidden> · 2017-05-18
Re: [RFC 1/6] mm, page_alloc: fix more premature OOM due to race with cpuset update · Christoph Lameter <hidden> · 2017-05-18
Re: [RFC 1/6] mm, page_alloc: fix more premature OOM due to race with cpuset update · Vlastimil Babka <hidden> · 2017-05-19

From: Vlastimil Babka <hidden>
Date: 2017-05-18 10:04:27
Also in: cgroups, linux-mm, lkml

On 05/17/2017 04:48 PM, Christoph Lameter wrote:

On Wed, 17 May 2017, Michal Hocko wrote:

quoted

So how are you going to distinguish VM_FAULT_OOM from an empty mempolicy
case in a raceless way?

You dont have to do that if you do not create an empty mempolicy in the
first place. The current kernel code avoids that by first allowing access
to the new set of nodes and removing the old ones from the set when done.

which is racy and as Vlastimil pointed out. If we simply fail such an
allocation the failure will go up the call chain until we hit the OOM
killer due to VM_FAULT_OOM. How would you want to handle that?

The race is where? If you expand the node set during the move of the
application then you are safe in terms of the legacy apps that did not
include static bindings.

No, that expand/shrink by itself doesn't work against parallel
get_page_from_freelist going through a zonelist. Moving from node 0 to
1, with zonelist containing nodes 1 and 0 in that order:

- mempolicy mask is 0
- zonelist iteration checks node 1, it's not allowed, skip
- mempolicy mask is 0,1 (expand)
- mempolicy mask is 1 (shrink)
- zonelist iteration checks node 0, it's not allowed, skip
- OOM

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help