Re: [RFC PATCH 1/2] mm, tree wide: replace __GFP_REPEAT by __GFP_RETRY_HARD... | linux-mm

Re: [RFC PATCH 1/2] mm, tree wide: replace __GFP_REPEAT by __GFP_RETRY_HARD with more useful semantic

From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date: 2016-06-13 14:54:21
Also in: lkml

Michal Hocko wrote:

On Sat 11-06-16 23:35:49, Tetsuo Handa wrote:

quoted

Michal Hocko wrote:

quoted

On Tue 07-06-16 21:11:03, Tetsuo Handa wrote:

quoted

Remaining __GFP_REPEAT users are not always doing costly allocations.

Yes but...

quoted

Sometimes they pass __GFP_REPEAT because the size is given from userspace.
Thus, unconditional s/__GFP_REPEAT/__GFP_RETRY_HARD/g is not good.

Would that be a regression though? Strictly speaking the __GFP_REPEAT
documentation was explicit to not loop for ever. So nobody should have
expected nofail semantic pretty much by definition. The fact that our
previous implementation was not fully conforming to the documentation is
just an implementation detail.  All the remaining users of __GFP_REPEAT
_have_ to be prepared for the allocation failure. So what exactly is the
problem with them?

A !costly allocation becomes weaker than now if __GFP_RETRY_HARD is passed.

That is true. But it is not weaker than the __GFP_REPEAT actually ever
promissed. __GFP_REPEAT explicitly said to not retry _for_ever_. The
fact that we have ignored it is sad but that is what I am trying to
address here.

Whatever you rename __GFP_REPEAT to, it sounds strange to me that !costly
__GFP_REPEAT allocations are weaker than !costly !__GFP_REPEAT allocations.
Are you planning to make !costly !__GFP_REPEAT allocations to behave like
__GFP_NORETRY?

quoted

 	/* Reclaim has failed us, start killing things */
 	page = __alloc_pages_may_oom(gfp_mask, order, ac, &did_some_progress);
 	if (page)

@@ -3719,6 +3731,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	/* Retry as long as the OOM killer is making progress */
 	if (did_some_progress) {
 		no_progress_loops = 0;
+		passed_oom = true;

This is too premature. did_some_progress != 0 after returning from
__alloc_pages_may_oom() does not mean the OOM killer was invoked. It only means
that mutex_trylock(&oom_lock) was attempted.

which means that we have reached the OOM condition and _somebody_ is
actaully handling the OOM on our behalf.

That _somebody_ might release oom_lock without invoking the OOM killer (e.g.
doing !__GFP_FS allocation), which means that we have reached the OOM condition
and nobody is actually handling the OOM on our behalf. __GFP_RETRY_HARD becomes
as weak as __GFP_NORETRY. I think this is a regression.

I really fail to see your point. We are talking about a gfp flag which
tells the allocator to retry as much as it is feasible. Getting through
all the reclaim attempts two times without any progress sounds like a
fair criterion. Well, we could try $NUM times but that wouldn't make too
much difference to what you are writing above. The fact whether somebody
has been killed or not is not really that important IMHO.

If all the reclaim attempt first time made no progress, all the reclaim
attempt second time unlikely make progress unless the OOM killer kills
something. Thus, doing all the reclaim attempts two times without any progress
without killing somebody sounds almost equivalent to doing all the reclaim
attempt only once.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help