Thread (64 messages) 64 messages, 7 authors, 2025-04-22

Re: [PATCH 0/4] mm: permit guard regions for file-backed/shmem mappings

From: Lorenzo Stoakes <hidden>
Date: 2025-02-18 17:21:11
Also in: linux-kselftest, linux-mm, lkml

On Tue, Feb 18, 2025 at 06:14:00PM +0100, David Hildenbrand wrote:
On 18.02.25 17:43, Lorenzo Stoakes wrote:
quoted
On Tue, Feb 18, 2025 at 04:20:18PM +0100, David Hildenbrand wrote:
quoted
quoted
Right yeah that'd be super weird. And I don't want to add that logic.
quoted
Also not sure what happens if one does an mlock()/mlockall() after
already installing PTE markers.
The existing logic already handles non-present cases by skipping them, in
mlock_pte_range():

	for (pte = start_pte; addr != end; pte++, addr += PAGE_SIZE) {
		ptent = ptep_get(pte);
		if (!pte_present(ptent))
			continue;

		...
	}
I *think* that code only updates already-mapped folios, to properly call
mlock_folio()/munlock_folio().
Guard regions _are_ 'already mapped' :) so it leaves them in place.
"mapped folios" -- there is no folio mapped. Yes, the VMA is in place.
We're engaging in a moot discussion on this I think but I mean it appears
to operate by walking page tables if they are populated, which they will be
for guard regions, but when it finds it's non-present it will skip.

This amounts to the same thing as not doing anything, obviously.
quoted
do_mlock() -> apply_vma_lock_flags() -> mlock_fixup() -> mlock_vma_pages_range()
implies this will be invoked.
Yes, to process any already mapped folios, to then continue population
later.
quoted
quoted
It is not the code that populates pages on mlock()/mlockall(). I think all
that goes via mm_populate()/__mm_populate(), where "ordinary GUP" should
apply.
OK I want to correct what I said earlier.

Installing a guard region then attempting mlock() will result in an error. The
populate will -EFAULT and stop at the guard region, which causes mlock() to
error out.
Right, that's my expectation.
OK good!
quoted
This is a partial failure, so the VMA is split and has VM_LOCKED applied, but
the populate halts at the guard region.

This is ok as per previous discussion on aggregate operation failure, there can
be no expectation of 'unwinding' of partially successful operations that form
part of a requested aggregate one.

However, given there's stuff to clean up, and on error a user _may_ wish to then
remove guard regions and try again, I guess there's no harm in keeping the code
as it is where we allow MADV_GUARD_REMOVE even if VM_LOCKED is in place.
Likely yes; it's all weird code.
quoted
quoted
See populate_vma_page_range(), especially also the VM_LOCKONFAULT handling.
Yeah that code is horrible, you just reminded me of it... 'rightly or wrongly'
yeah wrongly, very wrongly...
quoted
quoted
Which covers off guard regions. Removing the guard regions after this will
leave you in a weird situation where these entries will be zapped... maybe
we need a patch to make MADV_GUARD_REMOVE check VM_LOCKED and in this case
also populate?
Maybe? Or we say that it behaves like MADV_DONTNEED_LOCKED.
See above, no we should not :P this is only good for cleanup after mlock()
failure, although no sane program should really be trying to do this, a sane
program would give up here (and it's a _programmatic error_ to try to mlock() a
range with guard regions).
quoted
quoted
quoted
Somme apps use mlockall(), and it might be nice to just be able to use
guard
quoted
quoted
pages as if "Nothing happened".
Sadly I think not given above :P
QEMU, for example, will issue an mlockall(MCL_CURRENT | MCL_FUTURE); when
requested to then exit(); if it fails.
Hm under what circumstances? I use qemu extensively to test this stuff with
no issues. Unless you mean it's using it in the 'host' code somehow.
Assume glibc or any lib uses it, QEMU would have no real way of figuring
that out or instructing offending libraries to disabled that, at least for
now  ...

... turning RT VMs less usable if any library uses guard regions. :(
This seems really stupid, to be honest. Unfortunately there's no way around
this, if software does stupid things then they get stupid prizes. There are
other ways mlock() and faulting in can fail too.
There is upcoming support for MCL_ONFAULT in QEMU [1] (see below).
Good.
[1] https://lkml.kernel.org/r/20250212173823.214429-3-peterx@redhat.com
quoted
quoted
E.g., QEMU has the option to use mlockall().
quoted
Then again we're currently asymmetric as you can add them _before_
mlock()'ing...
Right.

--
Cheers,

David / dhildenb
I think the _LOCKED idea is therefore kaput, because it just won't work
properly because populating guard regions fails.
Right, I think basic VM_LOCKED is out of the picture. VM_LOCKONFAULT might
be interesting, because we are skipping the population stage.
quoted
It fails because it tries to 'touch' the memory, but 'touching' guard
region memory causes a segfault. This kind of breaks the idea of
mlock()'ing guard regions.

I think adding workarounds to make this possible in any way is not really
worth it (and would probably be pretty gross).

We already document that 'mlock()ing lightweight guard regions will fail'
as per man page so this is all in line with that.
Right, and I claim that supporting VM_LOCKONFAULT might likely be as easy as
allowing install/remove of guard regions when that flag is set.
We already allow this flag! VM_LOCKED and VM_HUGETLB are the only flags we
disallow.
--
Cheers,

David / dhildenb
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help