Re: Bug in reclaim logic with exhausted nodes?

From: Christoph Lameter <hidden>
Date: 2014-04-03 16:41:40
Also in: linux-mm

On Mon, 31 Mar 2014, Nishanth Aravamudan wrote:

Yep. The node exists, it's just fully exhausted at boot (due to the
presence of 16GB pages reserved at boot-time).

Well if you want us to support that then I guess you need to propose
patches to address this issue.

I'd appreciate a bit more guidance? I'm suggesting that in this case the
node functionally has no memory. So the page allocator should not allow
allocations from it -- except (I need to investigate this still)
userspace accessing the 16GB pages on that node, but that, I believe,
doesn't go through the page allocator at all, it's all from hugetlb
interfaces. It seems to me there is a bug in SLUB that we are noting
that we have a useless per-node structure for a given nid, but not
actually preventing requests to that node or reclaim because of those
allocations.

Well if you can address that without impacting the fastpath then we could
do this. Otherwise we would need a fake structure here to avoid adding
checks to the fastpath

I think there is a logical bug (even if it only occurs in this
particular corner case) where if reclaim progresses for a THISNODE
allocation, we don't check *where* the reclaim is progressing, and thus
may falsely be indicating that we have done some progress when in fact
the allocation that is causing reclaim will not possibly make any more
progress.

Ok maybe we could address this corner case. How would you do this?

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help