Thread (16 messages) 16 messages, 4 authors, 2021-05-27

Re: [External] [PATCH 0/2] Track reserve map changes to restore on error

From: Mike Kravetz <hidden>
Date: 2021-05-27 16:08:27
Also in: lkml

On 5/26/21 7:48 PM, Mina Almasry wrote:
quoted hunk ↗ jump to hunk
On Wed, May 26, 2021 at 4:19 PM Mina Almasry [off-list ref] wrote:
quoted
On Wed, May 26, 2021 at 10:17 AM Mike Kravetz [off-list ref] wrote:
quoted
On 5/25/21 8:19 PM, Muchun Song wrote:
quoted
On Wed, May 26, 2021 at 7:31 AM Mike Kravetz [off-list ref] wrote:
quoted
Here is a modification to the reservation tracking for fixup on errors.
It is a more general change, but should work for the hugetlb_mcopy_pte_atomic
case as well.

Perhaps use this as a prerequisite for your fix(es)?  Pretty sure this
will eliminate the need for the call to hugetlb_unreserve_pages.
Hi Mike,

It seems like someone is fixing a bug, right? Maybe a link should be
placed in the cover letter so that someone can know what issue
we are facing.
Thanks Muchun,

I wanted to first see if these patches would work in the code Mina is
modifying.  If this works for Mina, then a more formal patch and request
for inclusion will be sent.
So a quick test: I apply my patche and yours on top of linus/master,
and I remove the hugetlb_unreserve_pages() call that triggered this
conversation, and run the userfaultfd test, resv_huge_pages underflows
again, so it seems on the surface this doesn't quite work as is.

Not quite sure what to do off the top of my head. I think I will try
to debug why the 3 patches don't work together and I will fix either
your patch or mine. I haven't taken a deep look yet; I just ran a
quick test.
Ok found the issue. With the setup I described above, the
hugetlb_shared test case passes:

./tools/testing/selftests/vm/userfaultfd hugetlb_shared 10 2
/tmp/kokonut_test/huge/userfaultfd_test && echo test success

The non-shared test case is the one that underflows:

./tools/testing/selftests/vm/userfaultfd hugetlb 10 2
/tmp/kokonut_test/huge/userfaultfd_test && echo test success

I've debugged a bit, and this messy hunk 'fixes' the underflow with
the non-shared case. (Sorry for the messiness).
@@ -2329,17 +2340,14 @@ void restore_reserve_on_error(struct hstate
*h, struct vm_area_struct *vma,
                                 */
                                SetHPageRestoreRsvCnt(page);
                } else {
-                       rc = vma_needs_reservation(h, vma, address);
-                       if (rc < 0)
-                               /*
-                                * See above comment about rare out of
-                                * memory condition.
-                                */
-                               SetHPageRestoreRsvCnt(page);
-                       else if (rc)
-                               vma_add_reservation(h, vma, address);
-                       else
-                               vma_end_reservation(h, vma, address);
+                       resv = inode_resv_map(vma->vm_file->f_mapping->host);
+                       if (resv) {
+                               int chg = region_del(resv, idx, idx+1);
+                               VM_BUG_ON(chg);
+                       }

The reason being is that on page allocation we region_add() an entry
into the resv_map regardless of whether this is a shared mapping or
not (vma_needs_reservation() + vma_commit_reservation(), which amounts
to region_add() at the end of the day).

To unroll back this change on error, we need to region_del() the region_add().

The code removed above doesn't end up calling region_del(), because
vma_needs_reservation() returns 0, because region_chg() sees there is
an entry in the resv_map, and returns 0.

The VM_BUG_ON() is just because I'm not sure how to handle that error.
Thanks Mina!

Yes, that new block of code in restore_reserve_on_error is incorrect for
the private mapping case.  Since alloc_huge_page does the region_add for
both shared and private mappings, it seems we should just do the
region_del for both.  I'll update this patch to fix this and take your
other comments into account.

-- 
Mike Kravetz
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help