Re: [RFC][PATCH] kvm: add suspend pm-notifier

From: Marc Zyngier <maz@kernel.org>
Date: 2021-06-04 10:04:02
Also in: kvm, kvmarm, linux-arm-kernel, linux-s390, lkml

On Fri, 04 Jun 2021 10:20:28 +0100,
Sergey Senozhatsky [off-list ref] wrote:

On (21/06/04 09:46), Marc Zyngier wrote:
[..]

quoted

+void kvm_arch_pm_notifier(struct kvm *kvm)
+{
+#ifdef CONFIG_PM
+	int c;
+
+	mutex_lock(&kvm->lock);
+	for (c = 0; c < kvm->created_vcpus; c++) {
+		struct kvm_vcpu *vcpu = kvm->vcpus[c];
+		int r;
+
+		if (!vcpu)
+			continue;

Wouldn't kvm_for_each_vcpu() avoid this kind of checks?

Right, that's what I do in v2, which I haven't posted yet.

[..]

quoted

+#include <linux/notifier.h>
+
 #ifndef KVM_MAX_VCPU_ID
 #define KVM_MAX_VCPU_ID KVM_MAX_VCPUS
 #endif

@@ -579,6 +581,10 @@ struct kvm {
 	pid_t userspace_pid;
 	unsigned int max_halt_poll_ns;
 	u32 dirty_ring_size;
+
+#ifdef CONFIG_PM
+	struct notifier_block pm_notifier;
+#endif

I'd certainly like to be able to opt out from this on architectures
that do not implement anything useful in the PM callbacks.

Well on the other hand PM-callbacks are harmless on those archs, they
won't overload the __weak function.

I don't care much for the callbacks. But struct kvm is bloated enough,
and I'd be happy not to have this structure embedded in it if I can
avoid it.

quoted

Please consider making this an independent config option that individual
archs can buy into.

Sure, I can take a look into this, but how is this better than __weak
function? (that's a real question)

Weak functions are OK AFAIC. More crud in struct kvm is what I'm not
OK with.

[..]

quoted

+#ifdef CONFIG_PM
+static int kvm_pm_notifier_call(struct notifier_block *bl,
+				unsigned long state,
+				void *unused)
+{
+	struct kvm *kvm = container_of(bl, struct kvm, pm_notifier);
+
+	switch (state) {
+	case PM_HIBERNATION_PREPARE:
+	case PM_SUSPEND_PREPARE:
+		kvm_arch_pm_notifier(kvm);

How about passing the state to the notifier callback? I'd expect it to
be useful to do something on resume too.

For different states we can have different kvm_arch functions instead.
kvm_arch_pm_notifier() can be renamed to kvm_arch_suspend_notifier(),
so that we don't need to have `switch (state)` in every arch-code. Then
for resume/post resume states we can have kvm_arch_resume_notifier()
arch functions.

I'd rather we keep an arch API that is similar to the one the rest of
the kernel has, instead of a flurry of small helpers that need to grow
each time someone adds a new PM state. A switch() in the arch-specific
implementation is absolutely fine.

quoted

+		break;
+	}
+	return NOTIFY_DONE;
+}
+
+static void kvm_init_pm_notifier(struct kvm *kvm)
+{
+	kvm->pm_notifier.notifier_call = kvm_pm_notifier_call;
+	kvm->pm_notifier.priority = INT_MAX;

How is this priority determined?

Nothing magical here. I want this to be executed first, before we suspend
ftrace, RCU and the like. Besides KVM is usually the last one to register
its PM callbacks, so there can be something on the notifier list that
returns NOTIFY_STOP_MASK in front of KVM PM-notifier list entry.

Which begs the question: should arch-specific code be able to veto
suspend and return an error itself? Always returning NOTIFY_DONE seems
highly optimistic.

	M.

-- 
Without deviation from the norm, progress is not possible.

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help