Re: [RFC]: mm,power: introduce MADV_WIPEONSUSPEND
From: Michal Hocko <mhocko@kernel.org>
Date: 2020-07-07 09:14:57
Also in:
linux-mm, linux-pm, virtualization
On Tue 07-07-20 10:01:23, Alexander Graf wrote:
On 07.07.20 09:44, Michal Hocko wrote:quoted
On Mon 06-07-20 14:52:07, Jann Horn wrote:quoted
On Mon, Jul 6, 2020 at 2:27 PM Alexander Graf [off-list ref] wrote:quoted
Unless we create a vsyscall that returns both the PID as well as the epoch and thus handles fork *and* suspend. I need to think about this a bit more :).You can't reliably detect forking by checking the PID if it is possible for multiple forks to be chained before the reuse check runs: - pid 1000 remembers its PID - pid 1000 forks, creating child pid 1001 - pid 1000 exits and is waited on by init - the pid allocator wraps around - pid 1001 forks, creating child pid 1000 - child with pid 1000 tries to check for forking, determines that its PID is 1000, and concludes that it is still the original processI must be really missing something here because I really fail to see why there has to be something new even invented. Sure, checking for pid is certainly a suboptimal solution because pids are terrible tokens to work with. We do have a concept of file descriptors which a much better and supports signaling. There is a clear source of the signal IIUC (migration) and there are consumers to act upon that (e.g. crypto backends). So what does really prevent to use a standard signal delivery over fd for this usecase?I wasn't part of the discussions on why things like WIPEONFORK were invented instead of just using signalling mechanisms, but the main reason I can think of are libraries.
Well, I would argue that WIPEONFORK is conceptually different. It is one time initialization mechanism with a very clear life time semantic. So any programming model is really as easy as, the initial state is always 0 for a new task without any surprises later on because you own the memory (essentially an extension to initialized .data section on exec to any new task). Compare that to a completely async nature of this interface. Any read would essentially have to be properly synchronized with the external event otherwise the state could have been corrupted. Such a consistency model is really cumbersome to work with.
As a library, you are under no control of the main loop usually, which means you just don't have a way to poll for an fd. As a library author, I would usually try to avoid very hard to create such a dependency, because it makes it really hard to glue pieces together. The same applies to signals btw, which would also be a possible way to propagate such events.
Just to clarify I didn't really mean posix signals here. Those would be quite clumsy indeed. But I can imagine that a library registers to a system wide means to get a notification. There are many examples for that, including a lot of usage inside libraries. All different *bus interfaces. -- Michal Hocko SUSE Labs