Thread (14 messages) 14 messages, 3 authors, 2021-12-23

Re: [PATCH 2/2] mm/damon: Add a new scheme to support demotion on tiered memory system

From: SeongJae Park <sj@kernel.org>
Date: 2021-12-22 08:43:30
Also in: lkml

On Tue, 21 Dec 2021 22:18:06 +0800 Baolin Wang [off-list ref] wrote:
Hi SeongJae,

On 12/21/2021 9:24 PM, SeongJae Park Wrote:
quoted
Hi Baolin,


Thank you for this great patch!  I left some comments below.

On Tue, 21 Dec 2021 17:18:04 +0800 Baolin Wang [off-list ref] wrote:
quoted
On tiered memory system, the reclaim path in shrink_page_list() already
support demoting pages to slow memory node instead of discarding the
pages. However, at that time the fast memory node memory wartermark is
already tense, which will increase the memory allocation latency during
page demotion.

We can rely on the DAMON in user space to help to monitor the cold
memory on fast memory node, and demote the cold pages to slow memory
node proactively to keep the fast memory node in a healthy state.
Thus this patch introduces a new scheme named DAMOS_DEMOTE to support
this feature.

Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
[...]
quoted
quoted
+ */
+static int damos_demote(struct damon_target *target, struct damon_region *r)
+{
+	struct mm_struct *mm;
+	LIST_HEAD(demote_pages);
+	LIST_HEAD(pagelist);
+	int target_nid, nr_pages, ret = 0;
+	unsigned int nr_succeeded, demoted_pages = 0;
+	struct page *page, *page2;
+
+	/* Validate if allowing to do page demotion */
+	if (!numa_demotion_enabled)
+		return -EINVAL;
+
+	mm = damon_get_mm(target);
+	if (!mm)
+		return -ENOMEM;
+
+	mmap_read_lock(mm);
+	walk_page_range(mm, PAGE_ALIGN(r->ar.start), PAGE_ALIGN(r->ar.end),
[...]
quoted
quoted
+			&damos_migrate_pages_walk_ops, &demote_pages);
+	mmap_read_unlock(mm);
+
+	mmput(mm);
+	if (list_empty(&demote_pages))
+		return 0;
+
+	list_for_each_entry_safe(page, page2, &demote_pages, lru) {
I'd prefer 'next' or 'next_page' instead of 'page2'.
Sure.
quoted
quoted
+		list_add(&page->lru, &pagelist);
+		target_nid = next_demotion_node(page_to_nid(page));
+		nr_pages = thp_nr_pages(page);
+
+		ret = migrate_pages(&pagelist, alloc_demote_page, NULL,
+				    target_nid, MIGRATE_SYNC, MR_DEMOTION,
+				    &nr_succeeded);
+		if (ret) {
+			if (!list_empty(&pagelist)) {
+				list_del(&page->lru);
+				mod_node_page_state(page_pgdat(page),
+						    NR_ISOLATED_ANON + page_is_file_lru(page),
+						    -nr_pages);
+				putback_lru_page(page);
+			}
+		} else {
+			__count_vm_events(PGDEMOTE_DIRECT, nr_succeeded);
+			demoted_pages += nr_succeeded;
+		}
Why should we do above work for each page on our own instead of exporting and
calling 'demote_page_list()'?
Cuase demote_page_list() is used for demote cold pages which are from 
one same fast memory node, they can have one same target demotion node.

But for the regions for the process, we can get some cold pages from 
different fast memory nodes (suppose we have mulptiple dram node on the 
system), so its target demotion node may be different. Thus we should 
migrate each cold pages with getting the correct target demotion node.
Thank you for the kind explanation.  But, I still want to reuse the code rather
than copying if possible.  How about below dumb ideas off the top of my head?

1. Call demote_page_list() for each page from here
2. Call demote_page_list() for each page from damos_migrate_pmd_entry()
3. Make damos_migrate_pages_walk_ops to configure multiple demote_pages lists,
   each per fast memory node, and calling demote_page_list() here for each
   per-fast-memory-node demote_pages list.
4. Make demote_page_list() handles this case and use it.  e.g., NULL pgdat
   parameter means the pages are not from same node.

Please let me know if I'm missing something.


Thanks,
SJ
Thanks for your comments.
  
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help