Thread (21 messages) 21 messages, 5 authors, 2011-02-18

Re: [PATCH v5 1/4] deactivate invalidated pages

From: Minchan Kim <hidden>
Date: 2011-02-18 10:39:34
Also in: lkml

Hi Balbir,

On Fri, Feb 18, 2011 at 3:29 PM, Balbir Singh [off-list ref] wrote:
* MinChan Kim [off-list ref] [2011-02-18 00:08:19]:
quoted
Recently, there are reported problem about thrashing.
(http://marc.info/?l=rsync&m=128885034930933&w=2)
It happens by backup workloads(ex, nightly rsync).
That's because the workload makes just use-once pages
and touches pages twice. It promotes the page into
active list so that it results in working set page eviction.

Some app developer want to support POSIX_FADV_NOREUSE.
But other OSes don't support it, either.
(http://marc.info/?l=linux-mm&m=128928979512086&w=2)

By other approach, app developers use POSIX_FADV_DONTNEED.
But it has a problem. If kernel meets page is writing
during invalidate_mapping_pages, it can't work.
It makes for application programmer to use it since they always
have to sync data before calling fadivse(..POSIX_FADV_DONTNEED) to
make sure the pages could be discardable. At last, they can't use
deferred write of kernel so that they could see performance loss.
(http://insights.oetiker.ch/linux/fadvise.html)

In fact, invalidation is very big hint to reclaimer.
It means we don't use the page any more. So let's move
the writing page into inactive list's head if we can't truncate
it right now.

Why I move page to head of lru on this patch, Dirty/Writeback page
would be flushed sooner or later. It can prevent writeout of pageout
which is less effective than flusher's writeout.

Originally, I reused lru_demote of Peter with some change so added
his Signed-off-by.

Reported-by: Ben Gamari <redacted>
Signed-off-by: Minchan Kim <redacted>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Rik van Riel <redacted>
Acked-by: Mel Gorman <redacted>
Reviewed-by: KOSAKI Motohiro <redacted>
Cc: Wu Fengguang <redacted>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Nick Piggin <redacted>
Signed-off-by: Minchan Kim <redacted>
---
Changelog since v4:
 - Change function comments - suggested by Johannes
 - Change function name - suggested by Johannes
 - Drop only dirty/writeback pages to deactive pagevec - suggested by Johannes
 - Add acked-by

Changelog since v3:
 - Change function comments - suggested by Johannes
 - Change function name - suggested by Johannes
 - add only dirty/writeback pages to deactive pagevec

Changelog since v2:
 - mapped page leaves alone - suggested by Mel
 - pass part related PG_reclaim in next patch.

Changelog since v1:
 - modify description
 - correct typo
 - add some comment

 include/linux/swap.h |    1 +
 mm/swap.c            |   78 ++++++++++++++++++++++++++++++++++++++++++++++++++
 mm/truncate.c        |   17 ++++++++---
 3 files changed, 91 insertions(+), 5 deletions(-)
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 4d55932..c335055 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -215,6 +215,7 @@ extern void mark_page_accessed(struct page *);
 extern void lru_add_drain(void);
 extern int lru_add_drain_all(void);
 extern void rotate_reclaimable_page(struct page *page);
+extern void deactivate_page(struct page *page);
 extern void swap_setup(void);

 extern void add_page_to_unevictable_list(struct page *page);
diff --git a/mm/swap.c b/mm/swap.c
index c02f936..4aea806 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -39,6 +39,7 @@ int page_cluster;
 static DEFINE_PER_CPU(struct pagevec[NR_LRU_LISTS], lru_add_pvecs);
 static DEFINE_PER_CPU(struct pagevec, lru_rotate_pvecs);
+static DEFINE_PER_CPU(struct pagevec, lru_deactivate_pvecs);

 /*
  * This path almost never happens for VM activity - pages are normally
@@ -347,6 +348,60 @@ void add_page_to_unevictable_list(struct page *page)
 }

 /*
+ * If the page can not be invalidated, it is moved to the
+ * inactive list to speed up its reclaim.  It is moved to the
+ * head of the list, rather than the tail, to give the flusher
+ * threads some time to write it out, as this is much more
+ * effective than the single-page writeout from reclaim.
+ */
+static void lru_deactivate(struct page *page, struct zone *zone)
+{
+     int lru, file;
+
+     if (!PageLRU(page) || !PageActive(page))
+             return;
+
+     /* Some processes are using the page */
+     if (page_mapped(page))
+             return;
+
+     file = page_is_file_cache(page);
+     lru = page_lru_base_type(page);
+     del_page_from_lru_list(zone, page, lru + LRU_ACTIVE);
+     ClearPageActive(page);
+     ClearPageReferenced(page);
+     add_page_to_lru_list(zone, page, lru);
+     __count_vm_event(PGDEACTIVATE);
+
+     update_page_reclaim_stat(zone, page, file, 0);
+}
+
+static void ____pagevec_lru_deactivate(struct pagevec *pvec)
+{
+     int i;
+     struct zone *zone = NULL;
+
+     for (i = 0; i < pagevec_count(pvec); i++) {
+             struct page *page = pvec->pages[i];
+             struct zone *pagezone = page_zone(page);
+
+             if (pagezone != zone) {
+                     if (zone)
+                             spin_unlock_irq(&zone->lru_lock);
+                     zone = pagezone;
+                     spin_lock_irq(&zone->lru_lock);
+             }
The optimization to avoid taking locks if the zone does not change is
quite subtle
I just used it without big considering as it's a normal technique of
page array handling we have been used. So I want to keep it if it
doesn't make big overhead.
quoted
+             lru_deactivate(page, zone);
+     }
+     if (zone)
+             spin_unlock_irq(&zone->lru_lock);
+
+     release_pages(pvec->pages, pvec->nr, pvec->cold);
+     pagevec_reinit(pvec);
+}
+
+
+/*
  * Drain pages out of the cpu's pagevecs.
  * Either "cpu" is the current CPU, and preemption has already been
  * disabled; or "cpu" is being hot-unplugged, and is already dead.
@@ -372,6 +427,29 @@ static void drain_cpu_pagevecs(int cpu)
              pagevec_move_tail(pvec);
              local_irq_restore(flags);
      }
+
+     pvec = &per_cpu(lru_deactivate_pvecs, cpu);
+     if (pagevec_count(pvec))
+             ____pagevec_lru_deactivate(pvec);
+}
+
+/**
+ * deactivate_page - forcefully deactivate a page
+ * @page: page to deactivate
+ *
+ * This function hints the VM that @page is a good reclaim candidate,
+ * for example if its invalidation fails due to the page being dirty
+ * or under writeback.
+ */
+void deactivate_page(struct page *page)
+{
+     if (likely(get_page_unless_zero(page))) {
+             struct pagevec *pvec = &get_cpu_var(lru_deactivate_pvecs);
+
+             if (!pagevec_add(pvec, page))
+                     ____pagevec_lru_deactivate(pvec);
+             put_cpu_var(lru_deactivate_pvecs);
+     }
 }

 void lru_add_drain(void)
diff --git a/mm/truncate.c b/mm/truncate.c
index 4d415b3..9ec7bc5 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -328,11 +328,12 @@ EXPORT_SYMBOL(truncate_inode_pages);
  * pagetables.
  */
 unsigned long invalidate_mapping_pages(struct address_space *mapping,
-                                    pgoff_t start, pgoff_t end)
+             pgoff_t start, pgoff_t end)
 {
      struct pagevec pvec;
      pgoff_t next = start;
-     unsigned long ret = 0;
+     unsigned long ret;
+     unsigned long count = 0;
      int i;

      pagevec_init(&pvec, 0);
@@ -359,8 +360,14 @@ unsigned long invalidate_mapping_pages(struct address_space *mapping,
                      if (lock_failed)
                              continue;

-                     ret += invalidate_inode_page(page);
-
+                     ret = invalidate_inode_page(page);
+                     /*
+                      * Invalidation is a hint that the page is no longer
+                      * of interest and try to speed up its reclaim.
+                      */
+                     if (!ret)
+                             deactivate_page(page);
Do we need to do this under page_lock? Is there scope for us to reuse
rotate_reclaimable_page() logic?
Good point.
I think we don't need page_lock. will fix.
About rotate_reclaimable_page, it has little bit similar logic but
several page flags test and irq disable are different so it would
result in ugly shape as far as I think.

I hope if you have a good idea, please, do refactoring after merging.

Thanks for the review, Balbir.

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help