Thread (27 messages) 27 messages, 7 authors, 2022-02-23

Re: [PATCH v4 1/1] mm: vmscan: Reduce throttling due to a failure to make progress

From: Shakeel Butt <hidden>
Date: 2021-12-02 16:31:11
Also in: linux-fsdevel, lkml, regressions

Hi Mel,

On Thu, Dec 2, 2021 at 7:07 AM Mel Gorman [off-list ref] wrote:
Mike Galbraith, Alexey Avramov and Darrick Wong all reported similar
problems due to reclaim throttling for excessive lengths of time.
In Alexey's case, a memory hog that should go OOM quickly stalls for
several minutes before stalling. In Mike and Darrick's cases, a small
memcg environment stalled excessively even though the system had enough
memory overall.

Commit 69392a403f49 ("mm/vmscan: throttle reclaim when no progress is being
made") introduced the problem although commit a19594ca4a8b ("mm/vmscan:
increase the timeout if page reclaim is not making progress") made it
worse. Systems at or near an OOM state that cannot be recovered must
reach OOM quickly and memcg should kill tasks if a memcg is near OOM.
Is there a reason we can't simply revert 69392a403f49 instead of adding
more code/heuristics? Looking more into 69392a403f49, I don't think the
code and commit message are in sync.

For the memcg reclaim, instead of just removing congestion_wait or
replacing it with schedule_timeout in mem_cgroup_force_empty(), why
change the behavior of all memcg reclaim. Also this patch effectively
reverts that behavior of 69392a403f49.

For direct reclaimers under global pressure, why is page allocator a bad
place for stalling on no progress reclaim? IMHO the callers of the
reclaim should decide what to do if reclaim is not making progress.

thanks,
Shakeel
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help