Re: [PATCH v4] mm: Enable suspend-only swap spaces
From: Evan Green <hidden>
Date: 2021-07-27 16:40:03
Also in:
linux-api, lkml
On Tue, Jul 27, 2021 at 5:21 AM David Hildenbrand [off-list ref] wrote:
On 27.07.21 11:48, David Hildenbrand wrote:quoted
On 27.07.21 02:12, Evan Green wrote:quoted
Add a new SWAP_FLAG_HIBERNATE_ONLY that adds a swap region but refuses to allow generic swapping to it. This region can still be wired up for use in suspend-to-disk activities, but will never have regular pages swapped to it. This flag will be passed in by utilities like swapon(8), usage would probably look something like: swapon -o hibernate /dev/sda2. Currently it's not possible to enable hibernation without also enabling generic swap for a given area. One semi-workaround for this is to delay the call to swapon() until just before attempting to hibernate, and then call swapoff() just after hibernate completes. This is somewhat kludgy, and also doesn't really work to keep swap out of the hibernate region. When hibernate begins, it starts by allocating a large chunk of memory for itself. This often ends up forcing a lot of data out into swap. By this time the hibernate region is eligible for generic swap, so swap ends up leaking into the hibernate region even with the workaround. There are a few reasons why usermode might want to be able to exclusively steer swap and hibernate. One reason relates to SSD wearing. Hibernate's endurance and speed requirements are different from swap. It may for instance be advantageous to keep hibernate in primary storage, but put swap in an SLC namespace. These namespaces are faster and have better endurance, but cost 3-4x in terms of capacity. Exclusively steering hibernate and swap enables system designers to accurately partition their storage without either wearing out their primary storage, or overprovisioning their fast swap area. Another reason to allow exclusive steering has to do with security. The requirements for designing systems with resilience against offline attacks are different between swap and hibernate. Swap effectively requires a dictionary of hashes, as pages can be added and removed arbitrarily, whereas hibernate only needs a single hash for the entire image. If you've set up block-level integrity for swap and image-level integrity for hibernate, then allowing swap blocks to possibly leak out to the hibernate region is problematic, since it creates swap pages not protected by any integrity. Swap regions with SWAP_FLAG_HIBERNATE_ONLY set will not appear in /proc/meminfo under SwapTotal and SwapFree, since they are not usable as general swap. These regions do still appear in /proc/swaps.Right, and they also don't account towards the memory overcommit calculations. Thanks for extending the patch description!
No problem, thanks for all the brainwaves directed at this.
quoted
[...]quoted
+ if (swap_flags & SWAP_FLAG_HIBERNATE_ONLY) { + if (IS_ENABLED(CONFIG_HIBERNATION)) { + if (swap_flags & ~SWAP_HIBERNATE_ONLY_VALID_FLAGS) + return -EINVAL; + + } else { + return -EINVAL; + } + }We could do short if ((swap_flags & SWAP_FLAG_HIBERNATE_ONLY) && (!IS_ENABLED(CONFIG_HIBERNATION) || (swap_flags & ~SWAP_HIBERNATE_ONLY_VALID_FLAGS))) return -EINVAL; or if (swap_flags & SWAP_FLAG_HIBERNATE_ONLY)) if (!IS_ENABLED(CONFIG_HIBERNATION) || (swap_flags & ~SWAP_HIBERNATE_ONLY_VALID_FLAGS)) return -EINVAL;quoted
+ if (!capable(CAP_SYS_ADMIN)) return -EPERM;@@ -3335,16 +3366,20 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags) if (swap_flags & SWAP_FLAG_PREFER) prio = (swap_flags & SWAP_FLAG_PRIO_MASK) >> SWAP_FLAG_PRIO_SHIFT; + + if (swap_flags & SWAP_FLAG_HIBERNATE_ONLY) + p->flags |= SWP_HIBERNATE_ONLY; enable_swap_info(p, prio, swap_map, cluster_info, frontswap_map); - pr_info("Adding %uk swap on %s. Priority:%d extents:%d across:%lluk %s%s%s%s%s\n", + pr_info("Adding %uk swap on %s. Priority:%d extents:%d across:%lluk %s%s%s%s%s%s\n", p->pages<<(PAGE_SHIFT-10), name->name, p->prio, nr_extents, (unsigned long long)span<<(PAGE_SHIFT-10), (p->flags & SWP_SOLIDSTATE) ? "SS" : "", (p->flags & SWP_DISCARDABLE) ? "D" : "", (p->flags & SWP_AREA_DISCARD) ? "s" : "", (p->flags & SWP_PAGE_DISCARD) ? "c" : "", - (frontswap_map) ? "FS" : ""); + (frontswap_map) ? "FS" : "", + (p->flags & SWP_HIBERNATE_ONLY) ? "H" : ""); mutex_unlock(&swapon_mutex); atomic_inc(&proc_poll_event);Looks like the cleanest alternative to me, as long as we don't want to invent new interfaces. Acked-by: David Hildenbrand <redacted>Pavel just mentioned uswsusp, and I wonder if it would be a possible alternative to this patch.
I think you're right that it would be possible to isolate the hibernate image with uswsusp if you avoid using the SNAPSHOT_*SWAP* ioctls. But I'd expect performance to suffer noticeably, since now every page is making a round trip out to usermode and back. I'd still very much use the HIBERNATE_ONLY flag if it were accepted, I think there's value to it. -Evan