Thread (12 messages) 12 messages, 4 authors, 2021-07-27

Re: [PATCH v2] mm: Enable suspend-only swap spaces

From: Evan Green <hidden>
Date: 2021-07-14 22:40:06
Also in: linux-mm, lkml

On Tue, Jul 13, 2021 at 10:42 PM Michal Hocko [off-list ref] wrote:
On Mon 12-07-21 14:32:05, Evan Green wrote:
quoted
On Mon, Jul 12, 2021 at 12:03 AM Michal Hocko [off-list ref] wrote:
quoted
[Cc linux-api]

On Fri 09-07-21 10:50:48, Evan Green wrote:
quoted
Currently it's not possible to enable hibernation without also enabling
generic swap for a given swap area. These two use cases are not the
same. For example there may be users who want to enable hibernation,
but whose drives don't have the write endurance for generic swap
activities.

Add a new SWAP_FLAG_NOSWAP that adds a swap region but refuses to allow
generic swapping to it. This region can still be wired up for use in
suspend-to-disk activities, but will never have regular pages swapped to
it.
Could you expand some more on why a strict exclusion is really
necessary? I do understand that one might not want to have swap storage
available all the time but considering that swapon is really a light
operation so something like the following should be a reasonable
workaround, no?
        swapon storage/file
        s2disk
        swapoff storage
Broadly, it seemed like a reasonable thing for the kernel to be able
to do. The workaround you suggest does work for some use cases, but it
seems like a gap the kernel could more naturally fill.

Without getting too off into the weeds, there a handful of factors
that make this change particularly useful to me:

 * Slicing off part of your SSD to be SLC (single level cell) is
expensive. From what I understand you gain endurance and speed at the
cost of 3-4x capacity. In other words for every 1GB of SLC space you
need for swap, it costs you 3-4GB of storage space out of the primary
namespace. So I'm incentivized to size this region as small as
possible. Hibernate's speed/endurance requirements are not quite as
harsh as regular swap. Steering them separately gives me the ability
to put the hibernate image in regular storage, and not be forced to
oversize expensive/fast swap space.
OK, this is likely true but it doesn't really explain/justify a
dedicated swap storage for hibernation.
Wait, yes it does. Hibernation has less stringent write endurance and
speed requirements than swap, so it makes sense to point it at storage
that doesn't pay the 3x capacity penalty, and save the fancy fast
stuff for swap. The exclusivity makes sense since you're trying not to
wear out your higher capacity storage with unnecessary writes. I'd
argue the API addition is worth it for this reason by itself. Usermode
has valid reasons for wanting to disentangle these.
quoted
 * Even with the workaround, swap can end up in the hibernate region.
Hibernate starts by allocating its giant 50%-of-memory region, which
is often the forcing function for pushing things into swap. With the
workaround, even if my hibernate region is in last priority, there's
still a reasonable chance I'll end up swapping into it.
Right there is no guarantee but why does that matter at all. From the
kernel point of view it doesn't really makes much difference what was
the source of the swapout.
quoted
If I have
different security designs for swap space and hibernate, then even a
chance of some swap leaking into this region is a problem.
Could you expand some more about the this part please?
Offline attacks (ie manipulating storage from underneath the machine)
are a major concern when enabling both swap and hibernate. But the
approach of adding integrity to mitigate offline attacks may differ
between swap and hibernate in the interest of performance. Swap for
instance essentially needs a per-page dictionary of hashes for
integrity, since pages can be added and removed arbitrarily. Hibernate
however just needs a single hash across the entire image to provide
integrity. If you have swap leaking onto a region where you don't have
integrity enabled (because say you handled integrity at the image
level for hibernate, and at the block layer for swap), your swap
integrity story is compromised.

There's a (likely defunct) series from Matthew Garrett that expounds a
bit on some of this, though it's also partially tangential:
https://lore.kernel.org/lkml/20210220013255.1083202-1-matthewgarrett@google.com/ (local)
quoted
 * I also want to limit the online attack surface that swap presents.
I can make headway here by disallowing open() calls on active swap
regions (via an LSM), and permanently disabling swapon/swapoff system
calls after early init. The workaround isn't great for me because I
want to set everything up at early init time and then not touch it. By
suspend time, on my system I no longer have the ability to make
swapon/swapoff calls.
This is clearly a policy call.
The goal was to show examples of why the workaround was insufficient.
Yes, the response to any particular example could be "just don't
choose to do that", but I'm hoping to show examples from several
different angles of how the flag is a valuable knob for usermode to
have.
All that being said, I am still missing any justification for the
dedicated swap storage. This is an ABI thing so the reasoning should be
really solid.
I'm hoping it is. I sympathize with the awkwardness of "swapon, but
don't swap!". But from what I can there is no other route that
wouldn't be hugely disruptive and risk breaking compatibility for
folks who want to continue to combine their hibernate and swap
regions.

I don't think this digs the design hole deeper. Yes, the ship on this
design has long ago sailed. But if we ever did try to dig ourselves
out of the swap/hibernate hole by providing new APIs to handle them
separately, this flag would serve as a good cutover to divert out of
the swap code and into the new shiny hibernate-only code. The APIs are
never going to be totally disentangled, so a clean cutover opportunity
is the best one can hope for.

-Evan
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help