Re: [PATCH V5 0/7] Allow user to request memory to be locked on page fault

From: Michal Hocko <mhocko@kernel.org>
Date: 2015-07-29 10:45:41
Also in: linux-alpha, linux-arch, linux-mips, linux-mm, linuxppc-dev, lkml, sparclinux

On Tue 28-07-15 09:49:42, Eric B Munson wrote:

On Tue, 28 Jul 2015, Michal Hocko wrote:

quoted

[I am sorry but I didn't get to this sooner.]

On Mon 27-07-15 10:54:09, Eric B Munson wrote:

quoted

Now that VM_LOCKONFAULT is a modifier to VM_LOCKED and
cannot be specified independentally, it might make more sense to mirror
that relationship to userspace.  Which would lead to soemthing like the
following:

A modifier makes more sense.

quoted

To lock and populate a region:
mlock2(start, len, 0);

To lock on fault a region:
mlock2(start, len, MLOCK_ONFAULT);

If LOCKONFAULT is seen as a modifier to mlock, then having the flags
argument as 0 mean do mlock classic makes more sense to me.

To mlock current on fault only:
mlockall(MCL_CURRENT | MCL_ONFAULT);

To mlock future on fault only:
mlockall(MCL_FUTURE | MCL_ONFAULT);

To lock everything on fault:
mlockall(MCL_CURRENT | MCL_FUTURE | MCL_ONFAULT);

Makes sense to me. The only remaining and still tricky part would be
the munlock{all}(flags) behavior. What should munlock(MLOCK_ONFAULT)
do? Keep locked and poppulate the range or simply ignore the flag an
just unlock?

I can see some sense to allow munlockall(MCL_FUTURE[|MLOCK_ONFAULT]),
munlockall(MCL_CURRENT) resp. munlockall(MCL_CURRENT|MCL_FUTURE) but
other combinations sound weird to me.

Anyway munlock with flags opens new doors of trickiness.

In the current revision there are no new munlock[all] system calls
introduced.  munlockall() unconditionally cleared both MCL_CURRENT and
MCL_FUTURE before the set and now unconditionally clears all three.
munlock() does the same for VM_LOCK and VM_LOCKONFAULT.

OK if new munlock{all}(flags) is not introduced then this is much saner
IMO.

If the user
wants to adjust mlockall flags today, they need to call mlockall a
second time with the new flags, this remains true for mlockall after
this set and the same behavior is mirrored in mlock2.

OK, this makes sense to me.

The only
remaining question I have is should we have 2 new mlockall flags so that
the caller can explicitly set VM_LOCKONFAULT in the mm->def_flags vs
locking all current VMAs on fault.  I ask because if the user wants to
lock all current VMAs the old way, but all future VMAs on fault they
have to call mlockall() twice:

	mlockall(MCL_CURRENT);
	mlockall(MCL_CURRENT | MCL_FUTURE | MCL_ONFAULT);

This has the side effect of converting all the current VMAs to
VM_LOCKONFAULT, but because they were all made present and locked in the
first call, this should not matter in most cases.

I think this is OK (worth documenting though) considering that ONFAULT
is just modifier for the current mlock* operation. The memory is locked
the same way for both - aka once the memory is present you do not know
whether it was done during mlock call or later during the fault.

The catch is that,
like mmap(MAP_LOCKED), mlockall() does not communicate if mm_populate()
fails.  This has been true of mlockall() from the beginning so I don't
know if it needs more than an entry in the man page to clarify (which I
will add when I add documentation for MCL_ONFAULT).

Yes this is true but unlike mmap it seems fixable I guess. We do not have
to unmap and we can downgrade mmap_sem to read and the fault so nobody
can race with a concurent mlock.

In a much less
likely corner case, it is not possible in the current setup to request
all current VMAs be VM_LOCKONFAULT and all future be VM_LOCKED.

Vlastimil has already pointed that out. MCL_FUTURE doesn't clear
MCL_CURRENT. I was quite surprised in the beginning but it makes a
perfect sense. mlockall call shouldn't lead into munlocking, that would
be just weird. Clearing MCL_FUTURE on MCL_CURRENT makes sense on the
other hand because the request is explicit about _current_ memory and it
doesn't lead to any munlocking.

-- 
Michal Hocko
SUSE Labs

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help