Re: OOM kills with lots of free swap

From: Luigi Semenzato <hidden>
Date: 2017-06-29 17:46:40

Well, my apologies, I haven't been able to reproduce the problem, so
there's nothing to go on here.

We had a bug (a local patch) which caused this, then I had a bug in my
test case, so I was confused.  I also have a recollection of this
happening in older kernels (3.8 I think), but I am not going to go
back that far since even if the problem exists, we have no evidence it
happens frequently.

Thanks!


On Tue, Jun 27, 2017 at 8:50 AM, Michal Hocko [off-list ref] wrote:

On Tue 27-06-17 08:22:36, Luigi Semenzato wrote:

quoted

(sorry, I forgot to turn off HTML formatting)

Thank you, I can try this on ToT, although I think that the problem is
not with the OOM killer itself but earlier---i.e. invoking the OOM
killer seems unnecessary and wrong.  Here's the question.

The general strategy for page allocation seems to be (please correct
me as needed):

1. look in the free lists
2. if that did not succeed, try to reclaim, then try again to allocate
3. keep trying as long as progress is made (i.e. something was reclaimed)
4. if no progress was made and no pages were found, invoke the OOM killer.

Yes that is the case very broadly speaking. The hard question really is
what "no progress" actually means. We use "no pages could be reclaimed"
as the indicator. We cannot blow up at the first such instance of
course because that could be too early (e.g. data under writeback
and many other details). With 4.7+ kernels this is implemented in
should_reclaim_retry. Prior to the rework we used to rely on
zone_reclaimable which simply checked how many pages we have scanned
since the last page has been freed and if that is 6 times the
reclaimable memory then we simply give up. It had some issues described
in 0a0337e0d1d1 ("mm, oom: rework oom detection").

quoted

I'd like to know if that "progress is made" notion is possibly buggy.
Specifically, does it mean "progress is made by this task"?  Is it
possible that resource contention creates a situation where most tasks
in most cases can reclaim and allocate, but one task randomly fails to
make progress?

This can happen, alhtough it is quite unlikely. We are trying to
throttle allocations but you can hardly fight a consistent badluck ;)

In order to see what is going on in your particular case we need an oom
report though.
--
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help