Re: [RFC PATCH 1/1] mm/hugetlb mm/oom_kill: Add support for reclaiming... | linux-mm

Re: [RFC PATCH 1/1] mm/hugetlb mm/oom_kill: Add support for reclaiming hugepages on OOM events.

From: Liam R. Howlett <hidden>
Date: 2017-08-01 01:12:24

* Michal Hocko [off-list ref] [170731 10:08]:

On Mon 31-07-17 09:56:48, Liam R. Howlett wrote:

quoted

* Michal Hocko [off-list ref] [170731 05:10]:

quoted

On Fri 28-07-17 21:56:38, Liam R. Howlett wrote:

quoted

* Michal Hocko [off-list ref] [170728 08:44]:

quoted

On Fri 28-07-17 14:23:50, Michal Hocko wrote:

quoted

Other than that hugetlb pages are not reclaimable by design and users
do rely on that. Otherwise they could consider using THP instead.

If somebody configures the initial pool too high it is a configuration
bug. Just think about it, we do not want to reset lowmem reserves
configured by admin just because we are hitting the oom killer and yes
insanely large lowmem reserves might lead to early OOM as well.

The case I raise is a correctly configured system which has a memory
module failure.

So you are concerned about MCEs due to failing memory modules? If yes
why do you care about hugetlb in particular?

No,  I am concerned about a failed memory module.  The system will
detect certain failures, mark the memory as bad and automatically
reboot.  Up on rebooting, that module will not be used.

How do you detect/configure this? We do have HWPoison infrastructure

I don't right now but I felt I was at a stage where I would like to RFC
to try and have this go smoother.  I've not researched this but off
hand: dmidecode is able to detect that there is a memory module
disabled.  This alone would not indicate a failure, but if one was to
see a disabled DIMM and an invalid configuration it might be worth
pointing out on boot?

quoted

My focus on hugetlb is that it can stop the automatic recovery of the
system.

How?

Clarified in the thread fork - Thanks Matthew!

quoted

Are there other reservations that should also be considered?

What about any other memory reservations by memmap= kernel command line?

I've not seen any other reservation so large that a single failure
causes a failed boot due to OOM, but that doesn't mean they should be
ignored.

quoted

Modern systems will reboot and remove the memory from
the memory pool.  Linux will start to load and run out of memory.  I get
that this code has the side effect of doing what you're saying.  Do you
see this as a worth while feature and if so, do you know of a better way
for me to trigger the behaviour?

I do not understand your question. Could you elaborate more please? Are
you talking about system going into OOM because of too many MCEs?

No,  I'm talking about failed memory for whatever reason.  The system
reboots by a hardware means (I believe the memory controller) and
removes the memory on that failed module from the pool.  Now you
effectively have a system with less memory than before which invalidates
your configuration.  Is it worth while to have Linux successfully boot
when the system attempts to recover from a failure?

Cetainly yes but if you boot with much less memory and you want to use
hugetlb pages then you have to reconsider and maybe even reconfigure
your workload to reflect new conditions. So I am not really sure this
can be fully automated.

I agree.  A reconfiguration or repair is required to have optimum
performance.  Would you agree that having functioning system better than
a reboot loop or hang on a panic?  It's also easier to reconfigure a
system that's booting.

quoted

Nacked-by: Michal Hocko [off-list ref]

Hm. I'm not sure it's fully justified. To me, reclaiming hugetlb is
something to be considered as last resort after all other measures have
been tried.

System can recover from the OOM killer in most cases and there is no
real reason to break contracts which administrator established. On the
other hand you cannot assume correct operation of the SW which depends
on hugetlb pages in general. Such a SW might get unexpected crashes/data
corruptions and what not.

My question about allowing the reclaim to happen all the time was like
Kirill said, if there's memory that's not being used then why panic (or
kill a task)?  I see that Michal has thought this through though.  My
intent was to add this as a config option, but it sounds like that's
also a bad plan.

You cannot reclaim something that the administrator has asked for to be
available. Sure we can reclaim the excess if there is any but that is
not what your patch does

I'm looking at the free_huge_pages vs the resv_huge_pages.  I thought
the resv_huge_pages were the free pages that are already requested, so
if there were more free than reserved then they would be excess?

The terminology is little be confusing here. Hugetlb memory we have
committed into is reserved (e.g. by mmap) and we surely can have free
pages on top of resv_huge_pages but that is not an excess yet. We can
have surplus pages which would be an excess over what admin configured
initially. See Documentation/vm/{hugetlbpage.txt,hugetlbfs_reserv.txt}
for more information.

Thank you.  I will revisit this error if the patch is considered useful
at the end of the RFC conversation.

Cheers,
Liam

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help