Re: [PATCH v14 2/7] mm: add VM_DROPPABLE for designating always lazily... | linux-api

[PATCH v14 0/7] implement getrandom() in vDSO · "Jason A. Donenfeld" <Jason@zx2c4.com> · 2023-01-01
[PATCH v14 1/7] x86: lib: Separate instruction decoder MMIO type from MMIO trace · "Jason A. Donenfeld" <Jason@zx2c4.com> · 2023-01-01
Re: [PATCH v14 1/7] x86: lib: Separate instruction decoder MMIO type from MMIO trace · Ingo Molnar <mingo@kernel.org> · 2023-01-03
Re: [PATCH v14 1/7] x86: lib: Separate instruction decoder MMIO type from MMIO trace · "Jason A. Donenfeld" <Jason@zx2c4.com> · 2023-01-03
Re: [PATCH v14 1/7] x86: lib: Separate instruction decoder MMIO type from MMIO trace · Ingo Molnar <mingo@kernel.org> · 2023-01-03
Re: [PATCH v14 1/7] x86: lib: Separate instruction decoder MMIO type from MMIO trace · Borislav Petkov <bp@alien8.de> · 2023-01-03
Re: [PATCH v14 1/7] x86: lib: Separate instruction decoder MMIO type from MMIO trace · "Jason A. Donenfeld" <Jason@zx2c4.com> · 2023-01-03
Re: [PATCH v14 1/7] x86: lib: Separate instruction decoder MMIO type from MMIO trace · Ingo Molnar <mingo@kernel.org> · 2023-01-03
Re: [PATCH v14 1/7] x86: lib: Separate instruction decoder MMIO type from MMIO trace · "Jason A. Donenfeld" <Jason@zx2c4.com> · 2023-01-03
Re: [PATCH v14 1/7] x86: lib: Separate instruction decoder MMIO type from MMIO trace · Ingo Molnar <mingo@kernel.org> · 2023-01-04
Re: [PATCH v14 1/7] x86: lib: Separate instruction decoder MMIO type from MMIO trace · "Jason A. Donenfeld" <Jason@zx2c4.com> · 2023-01-04
[PATCH v14 2/7] mm: add VM_DROPPABLE for designating always lazily freeable mappings · "Jason A. Donenfeld" <Jason@zx2c4.com> · 2023-01-01
Re: [PATCH v14 2/7] mm: add VM_DROPPABLE for designating always lazily freeable mappings · Ingo Molnar <mingo@kernel.org> · 2023-01-03
Re: [PATCH v14 2/7] mm: add VM_DROPPABLE for designating always lazily freeable mappings · "Jason A. Donenfeld" <Jason@zx2c4.com> · 2023-01-03
Re: [PATCH v14 2/7] mm: add VM_DROPPABLE for designating always lazily freeable mappings · Ingo Molnar <mingo@kernel.org> · 2023-01-03
Re: [PATCH v14 2/7] mm: add VM_DROPPABLE for designating always lazily freeable mappings · "Jason A. Donenfeld" <Jason@zx2c4.com> · 2023-01-03
Re: [PATCH v14 2/7] mm: add VM_DROPPABLE for designating always lazily freeable mappings · Andy Lutomirski <luto@kernel.org> · 2023-01-03
Re: [PATCH v14 2/7] mm: add VM_DROPPABLE for designating always lazily freeable mappings · "Jason A. Donenfeld" <Jason@zx2c4.com> · 2023-01-03
Re: [PATCH v14 2/7] mm: add VM_DROPPABLE for designating always lazily freeable mappings · Andy Lutomirski <luto@kernel.org> · 2023-01-03
Re: [PATCH v14 2/7] mm: add VM_DROPPABLE for designating always lazily freeable mappings · Linus Torvalds <torvalds@linux-foundation.org> · 2023-01-03
Re: [PATCH v14 2/7] mm: add VM_DROPPABLE for designating always lazily freeable mappings · "Jason A. Donenfeld" <Jason@zx2c4.com> · 2023-01-03
Re: [PATCH v14 2/7] mm: add VM_DROPPABLE for designating always lazily freeable mappings · Linus Torvalds <torvalds@linux-foundation.org> · 2023-01-03
Re: [PATCH v14 2/7] mm: add VM_DROPPABLE for designating always lazily freeable mappings · "Jason A. Donenfeld" <Jason@zx2c4.com> · 2023-01-03
Re: [PATCH v14 2/7] mm: add VM_DROPPABLE for designating always lazily freeable mappings · Linus Torvalds <torvalds@linux-foundation.org> · 2023-01-03
Re: [PATCH v14 2/7] mm: add VM_DROPPABLE for designating always lazily freeable mappings · Linus Torvalds <torvalds@linux-foundation.org> · 2023-01-03
Re: [PATCH v14 2/7] mm: add VM_DROPPABLE for designating always lazily freeable mappings · "Jason A. Donenfeld" <Jason@zx2c4.com> · 2023-01-03
Re: [PATCH v14 2/7] mm: add VM_DROPPABLE for designating always lazily freeable mappings · Yann Droneaud <hidden> · 2023-01-05
Re: [PATCH v14 2/7] mm: add VM_DROPPABLE for designating always lazily freeable mappings · "Jason A. Donenfeld" <Jason@zx2c4.com> · 2023-01-05
Re: [PATCH v14 2/7] mm: add VM_DROPPABLE for designating always lazily freeable mappings · Linus Torvalds <torvalds@linux-foundation.org> · 2023-01-06
Re: [PATCH v14 2/7] mm: add VM_DROPPABLE for designating always lazily freeable mappings · Linus Torvalds <torvalds@linux-foundation.org> · 2023-01-06
Re: [PATCH v14 2/7] mm: add VM_DROPPABLE for designating always lazily freeable mappings · "Jason A. Donenfeld" <Jason@zx2c4.com> · 2023-01-06
Re: [PATCH v14 2/7] mm: add VM_DROPPABLE for designating always lazily freeable mappings · "Andy Lutomirski" <luto@kernel.org> · 2023-01-06
Re: [PATCH v14 2/7] mm: add VM_DROPPABLE for designating always lazily freeable mappings · Linus Torvalds <torvalds@linux-foundation.org> · 2023-01-06
Re: [PATCH v14 2/7] mm: add VM_DROPPABLE for designating always lazily freeable mappings · Dr. Greg <hidden> · 2023-01-10
Re: [PATCH v14 2/7] mm: add VM_DROPPABLE for designating always lazily freeable mappings · "Jason A. Donenfeld" <Jason@zx2c4.com> · 2023-01-06
Re: [PATCH v14 2/7] mm: add VM_DROPPABLE for designating always lazily freeable mappings · Matthew Wilcox <willy@infradead.org> · 2023-01-06
Re: [PATCH v14 2/7] mm: add VM_DROPPABLE for designating always lazily freeable mappings · Linus Torvalds <torvalds@linux-foundation.org> · 2023-01-06
Re: [PATCH v14 2/7] mm: add VM_DROPPABLE for designating always lazily freeable mappings · "Jason A. Donenfeld" <Jason@zx2c4.com> · 2023-01-06
[PATCH v14 3/7] x86: mm: Skip faulting instruction for VM_DROPPABLE faults · "Jason A. Donenfeld" <Jason@zx2c4.com> · 2023-01-01
[PATCH v14 4/7] random: add vgetrandom_alloc() syscall · "Jason A. Donenfeld" <Jason@zx2c4.com> · 2023-01-01
[PATCH v14 6/7] random: introduce generic vDSO getrandom() implementation · "Jason A. Donenfeld" <Jason@zx2c4.com> · 2023-01-01
[PATCH v14 5/7] arch: allocate vgetrandom_alloc() syscall number · "Jason A. Donenfeld" <Jason@zx2c4.com> · 2023-01-01
[PATCH v14 7/7] x86: vdso: Wire up getrandom() vDSO implementation · "Jason A. Donenfeld" <Jason@zx2c4.com> · 2023-01-01
Re: [PATCH v14 7/7] x86: vdso: Wire up getrandom() vDSO implementation · Christophe Leroy <hidden> · 2023-01-12
Re: [PATCH v14 7/7] x86: vdso: Wire up getrandom() vDSO implementation · "Jason A. Donenfeld" <Jason@zx2c4.com> · 2023-01-12
Re: [PATCH v14 0/7] implement getrandom() in vDSO · Mathieu Desnoyers <mathieu.desnoyers@efficios.com> · 2023-01-11

Re: [PATCH v14 2/7] mm: add VM_DROPPABLE for designating always lazily freeable mappings

From: Ingo Molnar <mingo@kernel.org>
Date: 2023-01-03 18:16:03
Also in: linux-crypto, linux-mm, linux-patches, lkml

* Jason A. Donenfeld [off-list ref] wrote:

On Tue, Jan 03, 2023 at 11:50:43AM +0100, Ingo Molnar wrote:

quoted

* Jason A. Donenfeld [off-list ref] wrote:

quoted

The vDSO getrandom() implementation works with a buffer allocated with a
new system call that has certain requirements:

- It shouldn't be written to core dumps.
  * Easy: VM_DONTDUMP.
- It should be zeroed on fork.
  * Easy: VM_WIPEONFORK.

- It shouldn't be written to swap.
  * Uh-oh: mlock is rlimited.
  * Uh-oh: mlock isn't inherited by forks.

- It shouldn't reserve actual memory, but it also shouldn't crash when
  page faulting in memory if none is available
  * Uh-oh: MAP_NORESERVE respects vm.overcommit_memory=2.
  * Uh-oh: VM_NORESERVE means segfaults.

It turns out that the vDSO getrandom() function has three really nice
characteristics that we can exploit to solve this problem:

1) Due to being wiped during fork(), the vDSO code is already robust to
   having the contents of the pages it reads zeroed out midway through
   the function's execution.

2) In the absolute worst case of whatever contingency we're coding for,
   we have the option to fallback to the getrandom() syscall, and
   everything is fine.

3) The buffers the function uses are only ever useful for a maximum of
   60 seconds -- a sort of cache, rather than a long term allocation.

These characteristics mean that we can introduce VM_DROPPABLE, which
has the following semantics:

a) It never is written out to swap.
b) Under memory pressure, mm can just drop the pages (so that they're
   zero when read back again).
c) If there's not enough memory to service a page fault, it's not fatal,
   and no signal is sent. Instead, writes are simply lost.
d) It is inherited by fork.
e) It doesn't count against the mlock budget, since nothing is locked.

This is fairly simple to implement, with the one snag that we have to
use 64-bit VM_* flags, but this shouldn't be a problem, since the only
consumers will probably be 64-bit anyway.

This way, allocations used by vDSO getrandom() can use:

    VM_DROPPABLE | VM_DONTDUMP | VM_WIPEONFORK | VM_NORESERVE

And there will be no problem with OOMing, crashing on overcommitment,
using memory when not in use, not wiping on fork(), coredumps, or
writing out to swap.

At the moment, rather than skipping writes on OOM, the fault handler
just returns to userspace, and the instruction is retried. This isn't
terrible, but it's not quite what is intended. The actual instruction
skipping has to be implemented arch-by-arch, but so does this whole
vDSO series, so that's fine. The following commit addresses it for x86.

Yeah, so VM_DROPPABLE adds a whole lot of complexity, corner cases, per 
arch low level work and rarely tested functionality (seriously, whose 
desktop system touches swap these days?), just so we can add a few pages of 
per thread vDSO data of a quirky type that in 99.999% of cases won't ever 
be 'dropped' from under the functionality that is using it and will thus 
bitrot fast?

It sounds like you've misunderstood the issue.

Firstly, the arch work is +19 lines (in the vdso branch of random.git).

For a single architecture: x86.

And it's only 19 lines because x86 already happens to have a bunch of 
complexity implemented, such as a safe instruction decoder that allows the 
skipping of an instruction - which relies on thousands of lines of 
complexity.

On an architecture where this isn't present, it would have to be 
implemented to support the instruction-skipping aspect of VM_DROPPABLE.

Even on x86, it's not common today for the software-decoder to be used in 
unprivileged code - primary use was debugging & instrumentation code. So 
your patches bring this piece of complexity to a much larger scope of 
untrusted user-space functionality.

That's very small and basic. Don't misrepresent it just to make a point.

I'm not misrepresenting anything.

Secondly, and more importantly, swapping this data is *not* permissible.

I did not suggest to swap it: my suggestion is to just pin these vDSO data 
pages. The per thread memory overhead is infinitesimal on the vast majority 
of the target systems, and the complexity trade-off you are proposing is 
poorly reasoned IMO.

Anyway:

  > Don't misrepresent it just to make a point.
  ...
  > That seems like a ridiculous rhetorical leap.
  ...
  > Did you actually read the commit message?

Frankly, I don't appreciate your condescending discussion style that 
borders on the toxic, and to save time I'm nacking this technical approach 
until both the patch-set and your reaction to constructive review feedback 
improves:

    NAcked-by: Ingo Molnar [off-list ref]

I think my core point that it would be much simpler to simply pin those 
pages and not introduce rarely-excercised 'discardable memory' semantics in 
Linux is a fair one - so it's straightforward to lift this NAK.

I'll re-evaluate the NACK on every new iteration of this patchset I see.

Thanks,

	Ingo

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help