Re: [RFC PATCH V1] mm: Disable demotion from proactive reclaim

From: Huang, Ying <hidden>
Date: 2022-11-30 05:33:06
Also in: cgroups, lkml

Yang Shi [off-list ref] writes:

On Mon, Nov 28, 2022 at 4:54 PM Huang, Ying [off-list ref] wrote:

quoted

Yang Shi [off-list ref] writes:

quoted

On Wed, Nov 23, 2022 at 9:52 PM Huang, Ying [off-list ref] wrote:

quoted

Hi, Johannes,

Johannes Weiner [off-list ref] writes:
[...]

quoted

The fallback to reclaim actually strikes me as wrong.

Think of reclaim as 'demoting' the pages to the storage tier. If we
have a RAM -> CXL -> storage hierarchy, we should demote from RAM to
CXL and from CXL to storage. If we reclaim a page from RAM, it means
we 'demote' it directly from RAM to storage, bypassing potentially a
huge amount of pages colder than it in CXL. That doesn't seem right.

If demotion fails, IMO it shouldn't satisfy the reclaim request by
breaking the layering. Rather it should deflect that pressure to the
lower layers to make room. This makes sure we maintain an aging
pipeline that honors the memory tier hierarchy.

Yes.  I think that we should avoid to fall back to reclaim as much as
possible too.  Now, when we allocate memory for demotion
(alloc_demote_page()), __GFP_KSWAPD_RECLAIM is used.  So, we will trigger
kswapd reclaim on lower tier node to free some memory to avoid fall back
to reclaim on current (higher tier) node.  This may be not good enough,
for example, the following patch from Hasan may help via waking up
kswapd earlier.

For the ideal case, I do agree with Johannes to demote the page tier
by tier rather than reclaiming them from the higher tiers. But I also
agree with your premature OOM concern.

quoted

https://lore.kernel.org/linux-mm/b45b9bf7cd3e21bca61d82dcd1eb692cd32c122c.1637778851.git.hasanalmaruf@fb.com/ (local)

Do you know what is the next step plan for this patch?

Should we do even more?

In my initial implementation I implemented a simple throttle logic
when the demotion is not going to succeed if the demotion target has
not enough free memory (just check the watermark) to make migration
succeed without doing any reclamation. Shall we resurrect that?

Can you share the link to your throttle patch?  Or paste it here?

I just found this on the mailing list.
https://lore.kernel.org/linux-mm/1560468577-101178-8-git-send-email-yang.shi@linux.alibaba.com/ (local)

Per my understanding, this patch will avoid demoting if there's no free
space on demotion target?  If so, I think that we should trigger kswapd
reclaiming on demotion target before that.  And we can simply avoid to
fall back to reclaim firstly, then avoid to scan as an improvement as
that in your patch above.

Best Regards,
Huang, Ying

But it didn't have the throttling logic, I may not submit that version
to the mailing list since we decided to drop this and merge mine and
Dave's.

Anyway it is not hard to add the throttling logic, we already have a
few throttling cases in vmscan, for example, "mm/vmscan: throttle
reclaim until some writeback completes if congested".

quoted

Waking kswapd sooner is fine to me, but it may be not enough, for
example, the kswapd may not keep up so remature OOM may happen on
higher tiers or reclaim may still happen. I think throttling the
reclaimer/demoter until kswapd makes progress could avoid both. And
since the lower tiers memory typically is quite larger than the higher
tiers, so the throttle should happen very rarely IMHO.

quoted

From another point of view, I still think that we can use falling back
to reclaim as the last resort to avoid OOM in some special situations,
for example, most pages in the lowest tier node are mlock() or too hot
to be reclaimed.

quoted

So I'm hesitant to design cgroup controls around the current behavior.

Best Regards,
Huang, Ying

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help