Thread (27 messages) 27 messages, 7 authors, 2021-01-14

Re: [PATCH 04/10] mm, fsdax: Refactor memory-failure handler for dax mapping

From: zhong jiang <hidden>
Date: 2021-01-13 10:05:59
Also in: linux-fsdevel, linux-mm, linux-xfs, lkml, nvdimm

On 2021/1/12 10:55 上午, Ruan Shiyang wrote:

On 2021/1/6 下午11:41, Jan Kara wrote:
quoted
On Thu 31-12-20 00:55:55, Shiyang Ruan wrote:
quoted
The current memory_failure_dev_pagemap() can only handle single-mapped
dax page for fsdax mode.  The dax page could be mapped by multiple 
files
and offsets if we let reflink feature & fsdax mode work together.  So,
we refactor current implementation to support handle memory failure on
each file and offset.

Signed-off-by: Shiyang Ruan <redacted>
Overall this looks OK to me, a few comments below.
quoted
---
  fs/dax.c            | 21 +++++++++++
  include/linux/dax.h |  1 +
  include/linux/mm.h  |  9 +++++
  mm/memory-failure.c | 91 
++++++++++++++++++++++++++++++++++-----------
  4 files changed, 100 insertions(+), 22 deletions(-)
...
quoted
quoted
  @@ -345,9 +348,12 @@ static void add_to_kill(struct task_struct 
*tsk, struct page *p,
      }
        tk->addr = page_address_in_vma(p, vma);
-    if (is_zone_device_page(p))
-        tk->size_shift = dev_pagemap_mapping_shift(p, vma);
-    else
+    if (is_zone_device_page(p)) {
+        if (is_device_fsdax_page(p))
+            tk->addr = vma->vm_start +
+                    ((pgoff - vma->vm_pgoff) << PAGE_SHIFT);
It seems strange to use 'pgoff' for dax pages and not for any other 
page.
Why? I'd rather pass correct pgoff from all callers of add_to_kill() and
avoid this special casing...
Because one fsdax page can be shared by multiple pgoffs.  I have to 
pass each pgoff in each iteration to calculate the address in vma (for 
tk->addr).  Other kinds of pages don't need this. They can get their 
unique address by calling "page_address_in_vma()".
IMO,   an fsdax page can be shared by multiple files rather than 
multiple pgoffs if fs query support reflink.   Because an page only 
located in an mapping(page->mapping is exclusive),  hence it  only has 
an pgoff or index pointing at the node.

  or  I miss something for the feature ?  thanks,
So, I added this fsdax case here.  This patchset only implemented the 
fsdax case, other cases also need to be added here if to be implemented.


-- 
Thanks,
Ruan Shiyang.
quoted
quoted
+        tk->size_shift = dev_pagemap_mapping_shift(p, vma, tk->addr);
+    } else
          tk->size_shift = page_shift(compound_head(p));
        /*
@@ -495,7 +501,7 @@ static void collect_procs_anon(struct page 
*page, struct list_head *to_kill,
              if (!page_mapped_in_vma(page, vma))
                  continue;
              if (vma->vm_mm == t->mm)
-                add_to_kill(t, page, vma, to_kill);
+                add_to_kill(t, page, NULL, 0, vma, to_kill);
          }
      }
      read_unlock(&tasklist_lock);
@@ -505,24 +511,19 @@ static void collect_procs_anon(struct page 
*page, struct list_head *to_kill,
  /*
   * Collect processes when the error hit a file mapped page.
   */
-static void collect_procs_file(struct page *page, struct list_head 
*to_kill,
-                int force_early)
+static void collect_procs_file(struct page *page, struct 
address_space *mapping,
+        pgoff_t pgoff, struct list_head *to_kill, int force_early)
  {
      struct vm_area_struct *vma;
      struct task_struct *tsk;
-    struct address_space *mapping = page->mapping;
-    pgoff_t pgoff;
        i_mmap_lock_read(mapping);
      read_lock(&tasklist_lock);
-    pgoff = page_to_pgoff(page);
      for_each_process(tsk) {
          struct task_struct *t = task_early_kill(tsk, force_early);
-
          if (!t)
              continue;
-        vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff,
-                      pgoff) {
+        vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff, 
pgoff) {
              /*
               * Send early kill signal to tasks where a vma covers
               * the page but the corrupted page is not necessarily
@@ -531,7 +532,7 @@ static void collect_procs_file(struct page 
*page, struct list_head *to_kill,
               * to be informed of all such data corruptions.
               */
              if (vma->vm_mm == t->mm)
-                add_to_kill(t, page, vma, to_kill);
+                add_to_kill(t, page, mapping, pgoff, vma, to_kill);
          }
      }
      read_unlock(&tasklist_lock);
@@ -550,7 +551,8 @@ static void collect_procs(struct page *page, 
struct list_head *tokill,
      if (PageAnon(page))
          collect_procs_anon(page, tokill, force_early);
      else
-        collect_procs_file(page, tokill, force_early);
+        collect_procs_file(page, page->mapping, page_to_pgoff(page),
Why not use page_mapping() helper here? It would be safer for THPs if 
they
ever get here...

                                Honza
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help