Re: [PATCH RFC v2 0/4] mm: Introduce MAP_BELOW_HINT

From: Lorenzo Stoakes <hidden>
Date: 2024-08-29 09:56:57
Also in: linux-alpha, linux-arch, linux-kselftest, linux-mips, linux-mm, linux-s390, linux-sh, lkml, loongarch, sparclinux

On Thu, Aug 29, 2024 at 09:42:22AM GMT, Lorenzo Stoakes wrote:

On Thu, Aug 29, 2024 at 12:15:57AM GMT, Charlie Jenkins wrote:

quoted

Some applications rely on placing data in free bits addresses allocated
by mmap. Various architectures (eg. x86, arm64, powerpc) restrict the
address returned by mmap to be less than the 48-bit address space,
unless the hint address uses more than 47 bits (the 48th bit is reserved
for the kernel address space).

I'm still confused as to why, if an mmap flag is desired, and thus programs
are having to be heavily modified and controlled to be able to do this, why
you can't just do an mmap() with PROT_NONE early, around a hinted address
that, sits below the required limit, and then mprotect() or mmap() over it?

Your feature is a major adjustment to mmap(), it needs to be pretty
significantly justified, especially if taking up a new flag.

quoted

The riscv architecture needs a way to similarly restrict the virtual
address space. On the riscv port of OpenJDK an error is thrown if
attempted to run on the 57-bit address space, called sv57 [1].  golang
has a comment that sv57 support is not complete, but there are some
workarounds to get it to mostly work [2].

These applications work on x86 because x86 does an implicit 47-bit
restriction of mmap() address that contain a hint address that is less
than 48 bits.

You mean x86 _has_ to limit to physically available bits in a canonical
format :) this will not be the case for 5-page table levels though...

quoted

Instead of implicitly restricting the address space on riscv (or any
current/future architecture), a flag would allow users to opt-in to this
behavior rather than opt-out as is done on other architectures. This is
desirable because it is a small class of applications that do pointer
masking.

I raised this last time and you didn't seem to address it so to be more
blunt:

I don't understand why this needs to be an mmap() flag. From this it seems
the whole process needs allocations to be below a certain limit.

That _could_ be achieved through a 'personality' or similar (though a
personality is on/off, rather than allowing configuration so maybe
something else would be needed).

From what you're saying 57-bit is all you really need right? So maybe
ADDR_LIMIT_57BIT?

I don't see how you're going to actually enforce this in a process either
via an mmap flag, as a library might decide not to use it, so you'd need to
control the allocator, the thread library implementation, and everything
that might allocate.

Liam also raised various points about VMA particulars that I'm not sure are
addressed either.

I just find it hard to believe that everything will fit together.

I'd _really_ need to be convinced that this MAP_ flag is justified, and I"m
just not.

quoted

This flag will also allow seemless compatibility between all
architectures, so applications like Go and OpenJDK that use bits in a
virtual address can request the exact number of bits they need in a
generic way. The flag can be checked inside of vm_unmapped_area() so
that this flag does not have to be handled individually by each
architecture.

I'm still very unconvinced and feel the bar needs to be high for making
changes like this that carry maintainership burden.

So for me, it's a no really as an overall concept.

Happy to be convinced otherwise, however... (I may be missing details or
context that provide more justification).

Some more thoughts:

* If you absolutely must keep allocations below a certain limit, you'd
  probably need to actually associate this information with the VMA so the
  memory can't be mremap()'d somewhere invalid (you might not control all
  code so you can't guarantee this won't happen).
* Keeping a map limit associated with a VMA would be horrid and keeping
  VMAs as small as possible is a key aim, so that'd be a no go. VMA flags
  are in limited supply also.
* If we did implement a per-process thing, but it were arbitrary, we'd then
  have to handle all kinds of corner cases forever (this is UAPI, can't
  break it etc.) with crazy-low values, or determine a minimum that might
  vary by arch...
* If we did this we'd absolutely have to implement a check in the brk()
  implementation, which is a very very sensitive bit of code. And of
  course, in mmap() and mremap()... and any arch-specific code that might
  interface with this stuff (these functions are hooked).
* A fixed address limit would make more sense, but it seems difficult to
  know what would work for everybody, and again we'd have to deal with edge
  cases and having a permanent maintenance burden.
* If you did have a map flag what about merging between VMAs above the
  limit and below it? To avoid that you'd need to implement some kind of a
  'VMA flag that has an arbitrary characteristic' or a 'limit' field,
  adjust all the 'can VMA merge' functions and write extensive testing and
  none of that is frankly acceptable.
* We have some 'weird' arches that might have problem with certain virtual
  address ranges or require arbitrary mappings at a certain address range
  that a limit might not be able to account for.

I'm absolutely opposed to a new MAP_ flag for this, but even if you
implemented that, it implies a lot of complexity.

It implies even more complexity if you implement something per-process
except if it were a fixed limit.

And if you implement a fixed limit, it's hard to see that it'll be
acceptable to everybody, and I suspect we'd still run into some possible
weirdness.

So again, I'm struggling to see how this concept can be justified in any
form.

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help