Thread (11 messages) 11 messages, 4 authors, 2021-09-27

Re: [PATCH] mm/userfaultfd: selftests: Fix memory corruption with thp enabled

From: Peter Xu <peterx@redhat.com>
Date: 2021-09-24 19:59:44
Also in: lkml

On Fri, Sep 24, 2021 at 10:21:30AM -0700, Axel Rasmussen wrote:
On Thu, Sep 23, 2021 at 4:25 PM Peter Xu [off-list ref] wrote:
quoted
In RHEL's gating selftests we've encountered memory corruption in the uffd
event test even with upstream kernel:

        # ./userfaultfd anon 128 4
        nr_pages: 32768, nr_pages_per_cpu: 32768
        bounces: 3, mode: rnd racing read, userfaults: 6240 missing (6240) 14729 wp (14729)
        bounces: 2, mode: racing read, userfaults: 1444 missing (1444) 28877 wp (28877)
        bounces: 1, mode: rnd read, userfaults: 6055 missing (6055) 14699 wp (14699)
        bounces: 0, mode: read, userfaults: 82 missing (82) 25196 wp (25196)
        testing uffd-wp with pagemap (pgsize=4096): done
        testing uffd-wp with pagemap (pgsize=2097152): done
        testing events (fork, remap, remove): ERROR: nr 32427 memory corruption 0 1 (errno=0, line=963)
        ERROR: faulting process failed (errno=0, line=1117)

It can be easily reproduced when global thp enabled, which is the default for
RHEL.

It's also known as a side effect of commit 0db282ba2c12 ("selftest: use mmap
instead of posix_memalign to allocate memory", 2021-07-23), which is imho right
itself on using mmap() to make sure the addresses will be untagged even on arm.

The problem is, for each test we allocate buffers using two allocate_area()
calls.  We assumed these two buffers won't affect each other, however they
could, because mmap() could have found that the two buffers are near each other
and having the same VMA flags, so they got merged into one VMA.

It won't be a big problem if thp is not enabled, but when thp is agressively
enabled it means when initializing the src buffer it could accidentally setup
part of the dest buffer too when there's a shared THP that overlaps the two
regions.  Then some of the dest buffer won't be able to be trapped by
userfaultfd missing mode, then it'll cause memory corruption as described.

To fix it, do release_pages() after initializing the src buffer.
But, if I understand correctly, release_pages() will just free the
physical pages, but not touch the VMA(s). So, with the right
max_ptes_none setting, why couldn't khugepaged just decide to
re-collapse (with zero pages) immediately after we release the pages,
causing the same problem? It seems to me this change just
significantly narrows the race window (which explains why we see less
of the issue), but doesn't fix it fundamentally.
Did you mean you can reproduce the issue even with this patch?

It is a good point anyway, indeed I don't see anything stops it from happening.

I wanted to prepare a v2 by releasing the pages after uffdio registration where
we'll do the vma split, but it won't simply work because release_pages() will
cause the process to hang death since that test registers with EVENT_REMOVE,
and release_pages() upon the thp will trigger synchronous EVENT_REMOVE which
cannot be handled by anyone.

Another solution is to map some PROT_NONE regions between the buffers, to make
sure they won't share a VMA.  I'll need to think more about which is better..

-- 
Peter Xu

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help