Re: [RFC]: mm,power: introduce MADV_WIPEONSUSPEND

From: Michal Hocko <mhocko@kernel.org>
Date: 2020-07-07 09:14:57
Also in: linux-mm, linux-pm, virtualization

On Tue 07-07-20 10:01:23, Alexander Graf wrote:

On 07.07.20 09:44, Michal Hocko wrote:

quoted

On Mon 06-07-20 14:52:07, Jann Horn wrote:

quoted

On Mon, Jul 6, 2020 at 2:27 PM Alexander Graf [off-list ref] wrote:

quoted

Unless we create a vsyscall that returns both the PID as well as the
epoch and thus handles fork *and* suspend. I need to think about this a
bit more :).

You can't reliably detect forking by checking the PID if it is
possible for multiple forks to be chained before the reuse check runs:

  - pid 1000 remembers its PID
  - pid 1000 forks, creating child pid 1001
  - pid 1000 exits and is waited on by init
  - the pid allocator wraps around
  - pid 1001 forks, creating child pid 1000
  - child with pid 1000 tries to check for forking, determines that its
PID is 1000, and concludes that it is still the original process

I must be really missing something here because I really fail to see why
there has to be something new even invented. Sure, checking for pid is
certainly a suboptimal solution because pids are terrible tokens to work
with. We do have a concept of file descriptors which a much better and
supports signaling. There is a clear source of the signal IIUC
(migration) and there are consumers to act upon that (e.g. crypto
backends). So what does really prevent to use a standard signal delivery
over fd for this usecase?

I wasn't part of the discussions on why things like WIPEONFORK were invented
instead of just using signalling mechanisms, but the main reason I can think
of are libraries.

Well, I would argue that WIPEONFORK is conceptually different. It is
one time initialization mechanism with a very clear life time semantic.
So any programming model is really as easy as, the initial state is
always 0 for a new task without any surprises later on because you own
the memory (essentially an extension to initialized .data section on
exec to any new task).

Compare that to a completely async nature of this interface. Any read
would essentially have to be properly synchronized with the external
event otherwise the state could have been corrupted. Such a consistency
model is really cumbersome to work with.

As a library, you are under no control of the main loop usually, which means
you just don't have a way to poll for an fd. As a library author, I would
usually try to avoid very hard to create such a dependency, because it makes
it really hard to glue pieces together.

The same applies to signals btw, which would also be a possible way to
propagate such events.

Just to clarify I didn't really mean posix signals here. Those would be
quite clumsy indeed. But I can imagine that a library registers to a
system wide means to get a notification. There are many examples for
that, including a lot of usage inside libraries. All different *bus
interfaces.

-- 
Michal Hocko
SUSE Labs

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help