Re: [RFC PATCH 2/5] hugetlb: enhance hugetlb fault processing to support soft dirty
From: Peter Xu <peterx@redhat.com>
Date: 2021-02-17 19:34:34
Also in:
linux-s390, lkml
On Wed, Feb 10, 2021 at 04:03:19PM -0800, Mike Kravetz wrote:
quoted hunk ↗ jump to hunk
hugetlb fault processing code would COW all write faults where the pte was not writable. Soft dirty will write protect ptes as part of it's tracking mechanism. The existing hugetlb_cow code will do the right thing for PRIVATE mappings as it checks map_count. However, for SHARED mappings it would actually allocate and install a COW page. Modify the code to not call hugetlb_cow for SHARED mappings and just update the pte. Signed-off-by: Mike Kravetz <redacted> --- mm/hugetlb.c | 23 ++++++++++++++++------- 1 file changed, 16 insertions(+), 7 deletions(-)diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 47f3123afd1a..b561b6867ec1 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c@@ -4584,8 +4584,10 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, * spinlock. For private mappings, we also lookup the pagecache * page now as it is used to determine if a reservation has been * consumed. + * Only non-shared mappings are sent to hugetlb_cow. */ - if ((flags & FAULT_FLAG_WRITE) && !huge_pte_write(entry)) { + if ((flags & FAULT_FLAG_WRITE) && !huge_pte_write(entry) && + !(vma->vm_flags & VM_SHARED)) { if (vma_needs_reservation(h, vma, haddr) < 0) { ret = VM_FAULT_OOM; goto out_mutex;@@ -4593,9 +4595,7 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, /* Just decrements count, does not deallocate */ vma_end_reservation(h, vma, haddr); - if (!(vma->vm_flags & VM_MAYSHARE)) - pagecache_page = hugetlbfs_pagecache_page(h, - vma, haddr); + pagecache_page = hugetlbfs_pagecache_page(h, vma, haddr);
Pure question: I see that the check actually changed from VM_MAYSHARE into VM_SHARE, then I noticed I'm actually unclear on the difference.. Say, when VM_MAYSHARE is set, could VM_SHARED be cleared in any case? Or say, is this change intended? I see that vma_set_page_prot() tried to remove VM_SHARED if soft dirty enabled (which should cause vma_wants_writenotify() to return true, iiuc), however that's temporary just to calculate vm_page_prot, and it's not applied to the vma->vm_flags. I failed to find a place where VM_SHARED of the vma is cleared while VM_MAYSHARE is set..
quoted hunk ↗ jump to hunk
} ptl = huge_pte_lock(h, mm, ptep);@@ -4620,9 +4620,18 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, if (flags & FAULT_FLAG_WRITE) { if (!huge_pte_write(entry)) { - ret = hugetlb_cow(mm, vma, address, ptep, - pagecache_page, ptl); - goto out_put_page; + if (!(vma->vm_flags & VM_SHARED)) { + ret = hugetlb_cow(mm, vma, address, ptep, + pagecache_page, ptl); + goto out_put_page; + } + + /* write protected for soft dirty processing */ + if ((vma->vm_flags & VM_WRITE) &&
This VM_WRITE check seems to be redundant. As example, do_user_addr_fault() of x86 code will check this right after vma lookup by access_error(). So when reach here if "flags & FAULT_FLAG_WRITE", then VM_WRITE must be set, imho.
+ (vma->vm_flags & VM_SHARED)) + entry = huge_pte_mkwrite(entry);
Same question to VM_SHARED, since "(vma->vm_flags & VM_SHARED)" is just checked above and we'll go hugetlb_cow() otherwise.
+ + entry = huge_pte_mkdirty(entry);
There's another huge_pte_mkdirty() right below; likely we could merge them somehow? Thanks,
} entry = huge_pte_mkdirty(entry); } -- 2.29.2
-- Peter Xu