Re: [External] [PATCH 0/2] Track reserve map changes to restore on error
From: Mike Kravetz <hidden>
Date: 2021-05-27 16:08:27
Also in:
lkml
On 5/26/21 7:48 PM, Mina Almasry wrote:
quoted hunk ↗ jump to hunk
On Wed, May 26, 2021 at 4:19 PM Mina Almasry [off-list ref] wrote:quoted
On Wed, May 26, 2021 at 10:17 AM Mike Kravetz [off-list ref] wrote:quoted
On 5/25/21 8:19 PM, Muchun Song wrote:quoted
On Wed, May 26, 2021 at 7:31 AM Mike Kravetz [off-list ref] wrote:quoted
Here is a modification to the reservation tracking for fixup on errors. It is a more general change, but should work for the hugetlb_mcopy_pte_atomic case as well. Perhaps use this as a prerequisite for your fix(es)? Pretty sure this will eliminate the need for the call to hugetlb_unreserve_pages.Hi Mike, It seems like someone is fixing a bug, right? Maybe a link should be placed in the cover letter so that someone can know what issue we are facing.Thanks Muchun, I wanted to first see if these patches would work in the code Mina is modifying. If this works for Mina, then a more formal patch and request for inclusion will be sent.So a quick test: I apply my patche and yours on top of linus/master, and I remove the hugetlb_unreserve_pages() call that triggered this conversation, and run the userfaultfd test, resv_huge_pages underflows again, so it seems on the surface this doesn't quite work as is. Not quite sure what to do off the top of my head. I think I will try to debug why the 3 patches don't work together and I will fix either your patch or mine. I haven't taken a deep look yet; I just ran a quick test.Ok found the issue. With the setup I described above, the hugetlb_shared test case passes: ./tools/testing/selftests/vm/userfaultfd hugetlb_shared 10 2 /tmp/kokonut_test/huge/userfaultfd_test && echo test success The non-shared test case is the one that underflows: ./tools/testing/selftests/vm/userfaultfd hugetlb 10 2 /tmp/kokonut_test/huge/userfaultfd_test && echo test success I've debugged a bit, and this messy hunk 'fixes' the underflow with the non-shared case. (Sorry for the messiness).@@ -2329,17 +2340,14 @@ void restore_reserve_on_error(struct hstate*h, struct vm_area_struct *vma, */ SetHPageRestoreRsvCnt(page); } else { - rc = vma_needs_reservation(h, vma, address); - if (rc < 0) - /* - * See above comment about rare out of - * memory condition. - */ - SetHPageRestoreRsvCnt(page); - else if (rc) - vma_add_reservation(h, vma, address); - else - vma_end_reservation(h, vma, address); + resv = inode_resv_map(vma->vm_file->f_mapping->host); + if (resv) { + int chg = region_del(resv, idx, idx+1); + VM_BUG_ON(chg); + } The reason being is that on page allocation we region_add() an entry into the resv_map regardless of whether this is a shared mapping or not (vma_needs_reservation() + vma_commit_reservation(), which amounts to region_add() at the end of the day). To unroll back this change on error, we need to region_del() the region_add(). The code removed above doesn't end up calling region_del(), because vma_needs_reservation() returns 0, because region_chg() sees there is an entry in the resv_map, and returns 0. The VM_BUG_ON() is just because I'm not sure how to handle that error.
Thanks Mina! Yes, that new block of code in restore_reserve_on_error is incorrect for the private mapping case. Since alloc_huge_page does the region_add for both shared and private mappings, it seems we should just do the region_del for both. I'll update this patch to fix this and take your other comments into account. -- Mike Kravetz