Thread (33 messages) 33 messages, 6 authors, 2024-08-31

Re: [PATCH 00/16] mm: Introduce MAP_BELOW_HINT

From: Charlie Jenkins <hidden>
Date: 2024-08-28 21:40:01
Also in: linux-arch, linux-kselftest, linux-mips, linux-mm, linux-riscv, linux-s390, linux-sh, lkml, loongarch, sparclinux

On Wed, Aug 28, 2024 at 01:59:18PM -0700, Charlie Jenkins wrote:
On Wed, Aug 28, 2024 at 02:31:42PM -0400, Liam R. Howlett wrote:
quoted
* Charlie Jenkins [off-list ref] [240828 01:49]:
quoted
Some applications rely on placing data in free bits addresses allocated
by mmap. Various architectures (eg. x86, arm64, powerpc) restrict the
address returned by mmap to be less than the maximum address space,
unless the hint address is greater than this value.
Wait, what arch(s) allows for greater than the max?  The passed hint
should be where we start searching, but we go to the lower limit then
start at the hint and search up (or vice-versa on the directions).
I worded this awkwardly. On arm64 there is a page-table boundary at 48
bits and at 52 bits. On x86 the boundaries are at 48 bits and 57 bits.
The max value mmap is able to return on arm64 is 48 bits if the hint
address uses 48 bits or less, even if the architecture supports 5-level
paging and thus addresses can be 52 bits. Applications can opt-in to
using up to 52-bits in an address by using a hint address greater than
48 bits. x86 has the same behavior but with 57 bits instead of 52.

This reason this exists is because some applications arbitrarily replace
bits in virtual addresses with data with an assumption that the address
will not be using any of the bits above bit 48 in the virtual address.
As hardware with larger address spaces was released, x86 decided to
build safety guards into the kernel to allow the applications that made
these assumptions to continue to work on this different hardware.

This causes all application that use a hint address to silently be
restricted to 48-bit addresses. The goal of this flag is to have a way
for applications to explicitly request how many bits they want mmap to
use.
quoted
I don't understand how unmapping works on a higher address; we would
fail to free it on termination of the application.

Also, there are archs that map outside of the VMAs, which are freed by
freeing from the prev->vm_end to next->vm_start, so I don't understand
what that looks like in this reality as well.
quoted
On arm64 this barrier is at 52 bits and on x86 it is at 56 bits. This
flag allows applications a way to specify exactly how many bits they
want to be left unused by mmap. This eliminates the need for
applications to know the page table hierarchy of the system to be able
to reason which addresses mmap will be allowed to return.
But, why do they need to know today?  We have a limit for this don't we?
The limit is different for different architectures. On x86 the limit is
57 bits, and on arm64 it is 52 bits. So in the theoretical case that an
application requires 10 bits free in a virtual address, the application
would always work on arm64 regardless of the hint address, but on x86 if
the hint address is greater than 48 bits then the application will not
work.

The goal of this flag is to have consistent and tunable behavior of
mmap() when it is desired to ensure that mmap() only returns addresses
that use some number of bits.
quoted
Also, these upper limits are how some archs use the upper bits that you
are trying to use.
It does not eliminate the existing behavior of the architectures to
place this upper limits, it instead provides a way to have consistent
behavior across all architectures.
quoted
quoted
---
riscv made this feature of mmap returning addresses less than the hint
address the default behavior. This was in contrast to the implementation
of x86/arm64 that have a single boundary at the 5-level page table
region. However this restriction proved too great -- the reduced
address space when using a hint address was too small.
Yes, the hint is used to group things close together so it would
literally be random chance on if you have enough room or not (aslr and
all).
quoted
A patch for riscv [1] reverts the behavior that broke userspace. This
series serves to make this feature available to all architectures.
I don't fully understand this statement, you say it broke userspace so
now you are porting it to everyone?  This reads as if you are braking
the userspace on all architectures :)
It was the default for mmap on riscv. The difference here is that it is now
enabled by a flag instead. Instead of making the flag specific to riscv,
I figured that other architectures might find it useful as well.
quoted
If you fail to find room below, then your application fails as there is
no way to get the upper bits you need.  It would be better to fix this
in userspace - if your application is returned too high an address, then
free it and exit because it's going to fail anyways.
This flag is trying to define an API that is more robust than the
current behavior on that x86 and arm64 which implicitly restricts mmap()
addresses to 48 bits. A solution could be to just write in the docs that
mmap() will always exhaust all addresses below the hint address before
returning an address that is above the hint address. However a flag that
defines this behavior seems more intuitive.
quoted
quoted
I have only tested on riscv and x86.
This should be an RFC then.
Fair enough.
quoted
quoted
There is a tremendous amount of
duplicated code in mmap so the implementations across architectures I
believe should be mostly consistent. I added this feature to all
architectures that implement either
arch_get_mmap_end()/arch_get_mmap_base() or
arch_get_unmapped_area_topdown()/arch_get_unmapped_area(). I also added
it to the default behavior for arch_get_mmap_end()/arch_get_mmap_base().
Way too much duplicate code.  We should be figuring out how to make this
all work with the same code.

This is going to make the cloned code problem worse.
That would require standardizing every architecture with the generic
mmap() framework that arm64 has developed. That is far outside the scope
of this patch, but would be a great area to research for each of the
architectures that do not use the generic framework.
Thinking about this again, I could drop support for all architectures
that do not implement arch_get_mmap_base()/arch_get_mmap_end().
- Charlie
quoted
quoted
Link: https://lore.kernel.org/lkml/20240826-riscv_mmap-v1-2-cd8962afe47f@rivosinc.com/T/ (local) [1]

To: Arnd Bergmann <arnd@arndb.de>
To: Paul Walmsley <redacted>
To: Palmer Dabbelt <palmer@dabbelt.com>
To: Albert Ou <aou@eecs.berkeley.edu>
To: Catalin Marinas <catalin.marinas@arm.com>
To: Will Deacon <will@kernel.org>
To: Michael Ellerman <mpe@ellerman.id.au>
To: Nicholas Piggin <npiggin@gmail.com>
To: Christophe Leroy <redacted>
To: Naveen N Rao <naveen@kernel.org>
To: Muchun Song <muchun.song@linux.dev>
To: Andrew Morton <akpm@linux-foundation.org>
To: Liam R. Howlett <redacted>
To: Vlastimil Babka <redacted>
To: Lorenzo Stoakes <redacted>
To: Thomas Gleixner <redacted>
To: Ingo Molnar <mingo@redhat.com>
To: Borislav Petkov <bp@alien8.de>
To: Dave Hansen <dave.hansen@linux.intel.com>
To: x86@kernel.org
To: H. Peter Anvin <hpa@zytor.com>
To: Huacai Chen <chenhuacai@kernel.org>
To: WANG Xuerui <kernel@xen0n.name>
To: Russell King <linux@armlinux.org.uk>
To: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
To: James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
To: Helge Deller <deller@gmx.de>
To: Alexander Gordeev <agordeev@linux.ibm.com>
To: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
To: Heiko Carstens <hca@linux.ibm.com>
To: Vasily Gorbik <gor@linux.ibm.com>
To: Christian Borntraeger <borntraeger@linux.ibm.com>
To: Sven Schnelle <svens@linux.ibm.com>
To: Yoshinori Sato <ysato@users.sourceforge.jp>
To: Rich Felker <dalias@libc.org>
To: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
To: David S. Miller <davem@davemloft.net>
To: Andreas Larsson <andreas@gaisler.com>
To: Shuah Khan <shuah@kernel.org>
To: Alexandre Ghiti <redacted>
Cc: linux-arch@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: Palmer Dabbelt <redacted>
Cc: linux-riscv@lists.infradead.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-mm@kvack.org
Cc: loongarch@lists.linux.dev
Cc: linux-mips@vger.kernel.org
Cc: linux-parisc@vger.kernel.org
Cc: linux-s390@vger.kernel.org
Cc: linux-sh@vger.kernel.org
Cc: sparclinux@vger.kernel.org
Cc: linux-kselftest@vger.kernel.org
Signed-off-by: Charlie Jenkins <redacted>

---
Charlie Jenkins (16):
      mm: Add MAP_BELOW_HINT
      riscv: mm: Do not restrict mmap address based on hint
      mm: Add flag and len param to arch_get_mmap_base()
      mm: Add generic MAP_BELOW_HINT
      riscv: mm: Support MAP_BELOW_HINT
      arm64: mm: Support MAP_BELOW_HINT
      powerpc: mm: Support MAP_BELOW_HINT
      x86: mm: Support MAP_BELOW_HINT
      loongarch: mm: Support MAP_BELOW_HINT
      arm: mm: Support MAP_BELOW_HINT
      mips: mm: Support MAP_BELOW_HINT
      parisc: mm: Support MAP_BELOW_HINT
      s390: mm: Support MAP_BELOW_HINT
      sh: mm: Support MAP_BELOW_HINT
      sparc: mm: Support MAP_BELOW_HINT
      selftests/mm: Create MAP_BELOW_HINT test

 arch/arm/mm/mmap.c                           | 10 ++++++++
 arch/arm64/include/asm/processor.h           | 34 ++++++++++++++++++++++----
 arch/loongarch/mm/mmap.c                     | 11 +++++++++
 arch/mips/mm/mmap.c                          |  9 +++++++
 arch/parisc/include/uapi/asm/mman.h          |  1 +
 arch/parisc/kernel/sys_parisc.c              |  9 +++++++
 arch/powerpc/include/asm/task_size_64.h      | 36 +++++++++++++++++++++++-----
 arch/riscv/include/asm/processor.h           | 32 -------------------------
 arch/s390/mm/mmap.c                          | 10 ++++++++
 arch/sh/mm/mmap.c                            | 10 ++++++++
 arch/sparc/kernel/sys_sparc_64.c             |  8 +++++++
 arch/x86/kernel/sys_x86_64.c                 | 25 ++++++++++++++++---
 fs/hugetlbfs/inode.c                         |  2 +-
 include/linux/sched/mm.h                     | 34 ++++++++++++++++++++++++--
 include/uapi/asm-generic/mman-common.h       |  1 +
 mm/mmap.c                                    |  2 +-
 tools/arch/parisc/include/uapi/asm/mman.h    |  1 +
 tools/include/uapi/asm-generic/mman-common.h |  1 +
 tools/testing/selftests/mm/Makefile          |  1 +
 tools/testing/selftests/mm/map_below_hint.c  | 29 ++++++++++++++++++++++
 20 files changed, 216 insertions(+), 50 deletions(-)
---
base-commit: 5be63fc19fcaa4c236b307420483578a56986a37
change-id: 20240827-patches-below_hint_mmap-b13d79ae1c55
-- 
- Charlie
  
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help