Thread (13 messages) 13 messages, 4 authors, 2021-10-27

Re: [PATCH v1] mm, pagemap: expose hwpoison entry

From: Peter Xu <peterx@redhat.com>
Date: 2021-10-27 02:09:17
Also in: lkml

On Wed, Oct 27, 2021 at 08:27:36AM +0900, Naoya Horiguchi wrote:
On Mon, Oct 04, 2021 at 11:32:28PM +0900, Naoya Horiguchi wrote:
quoted
On Mon, Oct 04, 2021 at 01:55:30PM +0200, David Hildenbrand wrote:
quoted
On 04.10.21 13:50, Naoya Horiguchi wrote:
...
quoted
quoted
quoted
Hwpoison entry for hugepage is also exposed by this patch. The below
example shows how pagemap is visible in the case where a memory error
hit a hugepage mapped to a process.

     $ ./page-types --no-summary --pid $PID --raw --list --addr 0x700000000+0x400
     voffset offset  len     flags
     700000000       12fa00  1       ___U_______Ma__H_G_________________f_______1
     700000001       12fa01  1ff     ___________Ma___TG_________________f_______1
     700000200       12f800  1       __________B________X_______________f______w_
     700000201       12f801  1       ___________________X_______________f______w_   // memory failure hit this page
     700000202       12f802  1fe     __________B________X_______________f______w_

The entries with both of "X" flag (hwpoison flag) and "w" flag (swap
flag) are considered as hwpoison entries.  So all pages in 2MB range
are inaccessible from the process.  We can get actual error location
by page-types in physical address mode.

     $ ./page-types --no-summary --addr 0x12f800+0x200 --raw --list
     offset  len     flags
     12f800  1       __________B_________________________________
     12f801  1       ___________________X________________________
     12f802  1fe     __________B_________________________________

Signed-off-by: Naoya Horiguchi <redacted>
---
  fs/proc/task_mmu.c      | 41 ++++++++++++++++++++++++++++++++---------
  include/linux/swapops.h | 13 +++++++++++++
  tools/vm/page-types.c   |  7 ++++++-
  3 files changed, 51 insertions(+), 10 deletions(-)

Please also update the documentation located at

Documentation/admin-guide/mm/pagemap.rst
I will do this in the next post.
Reading the document, I found that swap type is already exported so we
could identify hwpoison entry with it (without new PM_HWPOISON bit).
One problem is that the format of swap types (like SWP_HWPOISON) depends
on a few config macros like CONFIG_DEVICE_PRIVATE and CONFIG_MIGRATION,
so we also need to export how the swap type field is interpreted.
I had similar question before.. though it was more on the generic swap entries
not the special ones yet.

The thing is I don't know how the userspace could interpret normal swap device
indexes out of reading pagemap, say if we have two swap devices with "swapon
-s" then I've no idea how do we know which device has which swap type index
allocated.  That seems to be a similar question asked above on special swap
types - the interface seems to be incomplete, if not unused at all.

AFAIU the information on "this page is swapped out to device X on offset Y" is
not reliable too, because the pagein/pageout from kernel is transparent to the
userspace and not under control of userspace at all.  IOW, if the user reads
that swap entry, then reads data upon the disk of that offset out and put it
somewhere else, then it means the data read could already be old if kernel
paged in the page after userspace reading the pagemap but before it reading the
disk, and I don't see any way to make it right unless the userspace could stop
the kernel from page-in a swap entry.  That's why I really wonder whether we
should expose normal swap entry at all, as I don't know how it could be helpful
and used in the 100% right way.

Special swap entries seem a bit different - at least for is_pfn_swap_entry()
typed swap entries we can still expose the PFN which might be helpful, which I
can't tell.

I used to send an email to Matt Mackall [off-list ref] and Dave Hansen
[off-list ref] asking about above but didn't get a reply. Ccing
again this time with the list copied.
I thought of adding new interfaces for example under /sys/kernel/mm/swap/type_format/,
which shows info like below (assuming that all CONFIG_{DEVICE_PRIVATE,MIGRATION,MEMORY_FAILURE}
is enabled):

  $ ls /sys/kernel/mm/swap/type_format/
  hwpoison
  migration_read
  migration_write
  device_write
  device_read
  device_exclusive_write
  device_exclusive_read
  
  $ cat /sys/kernel/mm/swap/type_format/hwpoison
  25
  
  $ cat /sys/kernel/mm/swap/type_format/device_write
  28

Does it make sense or any better approach?
Then I'm wondering whether we care about the rest of the normal swap devices
too with pagemap so do we need to expose some information there too (only if
there's a real use case, though..)?  Or... should we just don't expose swap
entries at all, at least generic swap entries?  We can still expose things like
hwpoison via PM_* bits well defined in that case.

Thanks,

-- 
Peter Xu

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help