Thread (17 messages) 17 messages, 3 authors, 2021-02-24

Re: [RFC PATCH 2/5] hugetlb: enhance hugetlb fault processing to support soft dirty

From: Peter Xu <peterx@redhat.com>
Date: 2021-02-17 19:34:34
Also in: linux-s390, lkml

On Wed, Feb 10, 2021 at 04:03:19PM -0800, Mike Kravetz wrote:
quoted hunk ↗ jump to hunk
hugetlb fault processing code would COW all write faults where the
pte was not writable.  Soft dirty will write protect ptes as part
of it's tracking mechanism.  The existing hugetlb_cow  code will do
the right thing for PRIVATE mappings as it checks map_count.  However,
for SHARED mappings it would actually allocate and install a COW page.
Modify the code to not call hugetlb_cow for SHARED mappings and just
update the pte.

Signed-off-by: Mike Kravetz <redacted>
---
 mm/hugetlb.c | 23 ++++++++++++++++-------
 1 file changed, 16 insertions(+), 7 deletions(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 47f3123afd1a..b561b6867ec1 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4584,8 +4584,10 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 	 * spinlock. For private mappings, we also lookup the pagecache
 	 * page now as it is used to determine if a reservation has been
 	 * consumed.
+	 * Only non-shared mappings are sent to hugetlb_cow.
 	 */
-	if ((flags & FAULT_FLAG_WRITE) && !huge_pte_write(entry)) {
+	if ((flags & FAULT_FLAG_WRITE) && !huge_pte_write(entry) &&
+					!(vma->vm_flags & VM_SHARED)) {
 		if (vma_needs_reservation(h, vma, haddr) < 0) {
 			ret = VM_FAULT_OOM;
 			goto out_mutex;
@@ -4593,9 +4595,7 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 		/* Just decrements count, does not deallocate */
 		vma_end_reservation(h, vma, haddr);
 
-		if (!(vma->vm_flags & VM_MAYSHARE))
-			pagecache_page = hugetlbfs_pagecache_page(h,
-								vma, haddr);
+		pagecache_page = hugetlbfs_pagecache_page(h, vma, haddr);
Pure question: I see that the check actually changed from VM_MAYSHARE into
VM_SHARE, then I noticed I'm actually unclear on the difference..  Say, when
VM_MAYSHARE is set, could VM_SHARED be cleared in any case?  Or say, is this
change intended?

I see that vma_set_page_prot() tried to remove VM_SHARED if soft dirty enabled
(which should cause vma_wants_writenotify() to return true, iiuc), however
that's temporary just to calculate vm_page_prot, and it's not applied to the
vma->vm_flags.  I failed to find a place where VM_SHARED of the vma is cleared
while VM_MAYSHARE is set..
quoted hunk ↗ jump to hunk
 	}
 
 	ptl = huge_pte_lock(h, mm, ptep);
@@ -4620,9 +4620,18 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 
 	if (flags & FAULT_FLAG_WRITE) {
 		if (!huge_pte_write(entry)) {
-			ret = hugetlb_cow(mm, vma, address, ptep,
-					  pagecache_page, ptl);
-			goto out_put_page;
+			if (!(vma->vm_flags & VM_SHARED)) {
+				ret = hugetlb_cow(mm, vma, address, ptep,
+						pagecache_page, ptl);
+				goto out_put_page;
+			}
+
+			/* write protected for soft dirty processing */
+			if ((vma->vm_flags & VM_WRITE) &&
This VM_WRITE check seems to be redundant.  As example, do_user_addr_fault() of
x86 code will check this right after vma lookup by access_error().  So when
reach here if "flags & FAULT_FLAG_WRITE", then VM_WRITE must be set, imho.
+					(vma->vm_flags & VM_SHARED))
+				entry = huge_pte_mkwrite(entry);
Same question to VM_SHARED, since "(vma->vm_flags & VM_SHARED)" is just checked
above and we'll go hugetlb_cow() otherwise.
+
+			entry = huge_pte_mkdirty(entry);
There's another huge_pte_mkdirty() right below; likely we could merge them somehow?

Thanks,
 		}
 		entry = huge_pte_mkdirty(entry);
 	}
-- 
2.29.2
-- 
Peter Xu

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help