Thread (12 messages) 12 messages, 4 authors, 2021-06-07

Re: [PATCH v4] mm, hugetlb: fix racy resv_huge_pages underflow on UFFDIO_COPY

From: Mina Almasry <hidden>
Date: 2021-06-01 00:12:06
Also in: lkml

On Mon, May 31, 2021 at 4:25 PM Andrew Morton [off-list ref] wrote:
On Thu, 27 May 2021 17:50:29 -0700 Mina Almasry [off-list ref] wrote:
quoted
On UFFDIO_COPY, if we fail to copy the page contents while holding the
hugetlb_fault_mutex, we will drop the mutex and return to the caller
after allocating a page that consumed a reservation. In this case there
may be a fault that double consumes the reservation. To handle this, we
free the allocated page, fix the reservations, and allocate a temporary
hugetlb page and return that to the caller. When the caller does the
copy outside of the lock, we again check the cache, and allocate a page
consuming the reservation, and copy over the contents.

Test:
Hacked the code locally such that resv_huge_pages underflows produce
a warning and the copy_huge_page_from_user() always fails, then:

./tools/testing/selftests/vm/userfaultfd hugetlb_shared 10
        2 /tmp/kokonut_test/huge/userfaultfd_test && echo test success
./tools/testing/selftests/vm/userfaultfd hugetlb 10
      2 /tmp/kokonut_test/huge/userfaultfd_test && echo test success

Both tests succeed and produce no warnings. After the
test runs number of free/resv hugepages is correct.
Many conflicts here with material that is queued for 5.14-rc1.

How serious is this problem?  Is a -stable backport warranted?
I've sent 2 similar patches to the list:

1. "[PATCH v4] mm, hugetlb: Fix simple resv_huge_pages underflow on UFFDIO_COPY"

This one is sent to -stable and linux-mm and is a fairly simple fix.

2. "[PATCH v4] mm, hugetlb: fix racy resv_huge_pages underflow on UFFDIO_COPY"

Which is this patch. It's a more complicated and not critical fix, so
not targeted for -stable. It's only sent to linux-mm.
If we decide to get this into 5.13 (and perhaps -stable) then I can
take a look at reworking all the 5.14 material on top.  If not very
serious then we could rework this on top of the already queued
material.
I assume given the above we want to rework this on top of the already
queued material. I can upload a v5 that is rebased on top of your
branch. Note that you have an earlier version of this fix in your
branch, so really this patch will turn into a fix for that patch if I
rebase it (I assume that's fine).
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help