Thread (33 messages) 33 messages, 6 authors, 2024-08-31

Re: [PATCH 00/16] mm: Introduce MAP_BELOW_HINT

From: Charlie Jenkins <hidden>
Date: 2024-08-29 07:14:28
Also in: linux-arch, linux-kselftest, linux-mips, linux-mm, linux-riscv, linux-s390, linux-sh, lkml, loongarch, sparclinux

On Wed, Aug 28, 2024 at 07:19:36PM +0100, Lorenzo Stoakes wrote:
On Tue, Aug 27, 2024 at 10:49:06PM GMT, Charlie Jenkins wrote:
quoted
Some applications rely on placing data in free bits addresses allocated
by mmap. Various architectures (eg. x86, arm64, powerpc) restrict the
address returned by mmap to be less than the maximum address space,
unless the hint address is greater than this value.

On arm64 this barrier is at 52 bits and on x86 it is at 56 bits. This
flag allows applications a way to specify exactly how many bits they
want to be left unused by mmap. This eliminates the need for
applications to know the page table hierarchy of the system to be able
to reason which addresses mmap will be allowed to return.

---
riscv made this feature of mmap returning addresses less than the hint
address the default behavior. This was in contrast to the implementation
of x86/arm64 that have a single boundary at the 5-level page table
region. However this restriction proved too great -- the reduced
address space when using a hint address was too small.

A patch for riscv [1] reverts the behavior that broke userspace. This
series serves to make this feature available to all architectures.
I'm a little confused as to the justification for this - you broke RISC V by
doing this, and have now reverted it, but now offer the same behaviour that
broke RISC V to all other architectures?

I mean this is how this reads, so I might be being ungenerous here :) but would
be good to clarify what the value-add is here.
Yeah I did not do a good job of explaining this! Having this be the
default behavior was broken, not that this feature in general was
broken.
I also wonder at use of a new MAP_ flag, they're a limited resource and we
should only ever add them if we _really_ need to. This seems a bit niche and
specific to be making such a big change for including touching a bunch of pretty
sensitive arch-specific code.

We have the ability to change how mmap() functions through 'personalities'
though of course this would impact every mmap() call in the process.

Overall I'm really not hugely convinced by this, it feels like userland
could find better ways of doing this (mostly you'd do a PROT_NONE mmap() to
reserve a domain and mprotect() it on allocation or mmap() over it).

So I just struggle to see the purpose myself. BUT absolutely I may be
missing context/others may have a view on the value of this. So happy to
stand corrected.
quoted
I have only tested on riscv and x86. There is a tremendous amount of
Yeah, OK this is crazy, you can't really submit something as non-RFC that
touches every single arch and not test it.

I also feel like we need more justification than 'this is a neat thing that
we use in RISC V sometimes' conceptually for such a big change.
I will send out a new version that does a much better job at explaining!
This is not something that is done on riscv ever currently. This is
something that is done on other architectures such as x86 and arm64.
This flag is to make similar behavior (the ability to force mmap to
return addresses that have a constrained address space) available to all
architectures.
Also your test program is currently completely broken afaict (have
commented on it directly). I also feel like your test program is a little
rudimentary, and should test some edge cases close to the limit etc.

So I think this is a NACK until there is testing across the board and a little
more justification.

Feel free to respin, but I think any future revisions should be RFC until
we're absolutely sure on testing/justification.

I appreciate your efforts here so sorry to be negative, but just obviously
want to make sure this is functional and trades off added complexity for
value for the kernel and userland :)
Totally understand thank you! After reviewing comments I have realized
that I made this much more complicated than it needs to be. This should
be able to be done without changing any architecture specific code. That
will mostly eliminate all of the complexity, but still has the downside
of consuming a MAP_ flag.

- Charlie
Thanks!
quoted
duplicated code in mmap so the implementations across architectures I
believe should be mostly consistent. I added this feature to all
architectures that implement either
arch_get_mmap_end()/arch_get_mmap_base() or
arch_get_unmapped_area_topdown()/arch_get_unmapped_area(). I also added
it to the default behavior for arch_get_mmap_end()/arch_get_mmap_base().

Link: https://lore.kernel.org/lkml/20240826-riscv_mmap-v1-2-cd8962afe47f@rivosinc.com/T/ (local) [1]

To: Arnd Bergmann <arnd@arndb.de>
To: Paul Walmsley <redacted>
To: Palmer Dabbelt <palmer@dabbelt.com>
To: Albert Ou <aou@eecs.berkeley.edu>
To: Catalin Marinas <catalin.marinas@arm.com>
To: Will Deacon <will@kernel.org>
To: Michael Ellerman <mpe@ellerman.id.au>
To: Nicholas Piggin <npiggin@gmail.com>
To: Christophe Leroy <redacted>
To: Naveen N Rao <naveen@kernel.org>
To: Muchun Song <muchun.song@linux.dev>
To: Andrew Morton <akpm@linux-foundation.org>
To: Liam R. Howlett <redacted>
To: Vlastimil Babka <redacted>
To: Lorenzo Stoakes <redacted>
To: Thomas Gleixner <redacted>
To: Ingo Molnar <mingo@redhat.com>
To: Borislav Petkov <bp@alien8.de>
To: Dave Hansen <dave.hansen@linux.intel.com>
To: x86@kernel.org
To: H. Peter Anvin <hpa@zytor.com>
To: Huacai Chen <chenhuacai@kernel.org>
To: WANG Xuerui <kernel@xen0n.name>
To: Russell King <linux@armlinux.org.uk>
To: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
To: James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
To: Helge Deller <deller@gmx.de>
To: Alexander Gordeev <agordeev@linux.ibm.com>
To: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
To: Heiko Carstens <hca@linux.ibm.com>
To: Vasily Gorbik <gor@linux.ibm.com>
To: Christian Borntraeger <borntraeger@linux.ibm.com>
To: Sven Schnelle <svens@linux.ibm.com>
To: Yoshinori Sato <ysato@users.sourceforge.jp>
To: Rich Felker <dalias@libc.org>
To: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
To: David S. Miller <davem@davemloft.net>
To: Andreas Larsson <andreas@gaisler.com>
To: Shuah Khan <shuah@kernel.org>
To: Alexandre Ghiti <redacted>
Cc: linux-arch@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: Palmer Dabbelt <redacted>
Cc: linux-riscv@lists.infradead.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-mm@kvack.org
Cc: loongarch@lists.linux.dev
Cc: linux-mips@vger.kernel.org
Cc: linux-parisc@vger.kernel.org
Cc: linux-s390@vger.kernel.org
Cc: linux-sh@vger.kernel.org
Cc: sparclinux@vger.kernel.org
Cc: linux-kselftest@vger.kernel.org
Signed-off-by: Charlie Jenkins <redacted>

---
Charlie Jenkins (16):
      mm: Add MAP_BELOW_HINT
      riscv: mm: Do not restrict mmap address based on hint
      mm: Add flag and len param to arch_get_mmap_base()
      mm: Add generic MAP_BELOW_HINT
      riscv: mm: Support MAP_BELOW_HINT
      arm64: mm: Support MAP_BELOW_HINT
      powerpc: mm: Support MAP_BELOW_HINT
      x86: mm: Support MAP_BELOW_HINT
      loongarch: mm: Support MAP_BELOW_HINT
      arm: mm: Support MAP_BELOW_HINT
      mips: mm: Support MAP_BELOW_HINT
      parisc: mm: Support MAP_BELOW_HINT
      s390: mm: Support MAP_BELOW_HINT
      sh: mm: Support MAP_BELOW_HINT
      sparc: mm: Support MAP_BELOW_HINT
      selftests/mm: Create MAP_BELOW_HINT test

 arch/arm/mm/mmap.c                           | 10 ++++++++
 arch/arm64/include/asm/processor.h           | 34 ++++++++++++++++++++++----
 arch/loongarch/mm/mmap.c                     | 11 +++++++++
 arch/mips/mm/mmap.c                          |  9 +++++++
 arch/parisc/include/uapi/asm/mman.h          |  1 +
 arch/parisc/kernel/sys_parisc.c              |  9 +++++++
 arch/powerpc/include/asm/task_size_64.h      | 36 +++++++++++++++++++++++-----
 arch/riscv/include/asm/processor.h           | 32 -------------------------
 arch/s390/mm/mmap.c                          | 10 ++++++++
 arch/sh/mm/mmap.c                            | 10 ++++++++
 arch/sparc/kernel/sys_sparc_64.c             |  8 +++++++
 arch/x86/kernel/sys_x86_64.c                 | 25 ++++++++++++++++---
 fs/hugetlbfs/inode.c                         |  2 +-
 include/linux/sched/mm.h                     | 34 ++++++++++++++++++++++++--
 include/uapi/asm-generic/mman-common.h       |  1 +
 mm/mmap.c                                    |  2 +-
 tools/arch/parisc/include/uapi/asm/mman.h    |  1 +
 tools/include/uapi/asm-generic/mman-common.h |  1 +
 tools/testing/selftests/mm/Makefile          |  1 +
 tools/testing/selftests/mm/map_below_hint.c  | 29 ++++++++++++++++++++++
 20 files changed, 216 insertions(+), 50 deletions(-)
---
base-commit: 5be63fc19fcaa4c236b307420483578a56986a37
change-id: 20240827-patches-below_hint_mmap-b13d79ae1c55
--
- Charlie


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
  
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help