Re: Bug in reclaim logic with exhausted nodes?
From: Christoph Lameter <hidden>
Date: 2014-04-03 16:41:40
Also in:
linux-mm
On Mon, 31 Mar 2014, Nishanth Aravamudan wrote:
Yep. The node exists, it's just fully exhausted at boot (due to the presence of 16GB pages reserved at boot-time).
Well if you want us to support that then I guess you need to propose patches to address this issue.
I'd appreciate a bit more guidance? I'm suggesting that in this case the node functionally has no memory. So the page allocator should not allow allocations from it -- except (I need to investigate this still) userspace accessing the 16GB pages on that node, but that, I believe, doesn't go through the page allocator at all, it's all from hugetlb interfaces. It seems to me there is a bug in SLUB that we are noting that we have a useless per-node structure for a given nid, but not actually preventing requests to that node or reclaim because of those allocations.
Well if you can address that without impacting the fastpath then we could do this. Otherwise we would need a fake structure here to avoid adding checks to the fastpath
I think there is a logical bug (even if it only occurs in this particular corner case) where if reclaim progresses for a THISNODE allocation, we don't check *where* the reclaim is progressing, and thus may falsely be indicating that we have done some progress when in fact the allocation that is causing reclaim will not possibly make any more progress.
Ok maybe we could address this corner case. How would you do this?