[PATCH v4 10/10] ARM/ARM64: KVM: Emulate PSCI v0.2 CPU_SUSPEND

From: Christoffer Dall <hidden>
Date: 2014-03-17 17:49:44

On Mon, Mar 17, 2014 at 12:24:28PM +0530, Anup Patel wrote:

On Mon, Mar 17, 2014 at 9:11 AM, Christoffer Dall
[off-list ref] wrote:

quoted

On Thu, Feb 06, 2014 at 05:01:42PM +0530, Anup Patel wrote:

quoted

This patch adds emulation of PSCI v0.2 CPU_SUSPEND function call for
KVM ARM/ARM64. This is a VCPU-level function call which can suspend
current VCPU or all VCPUs within current VCPU's affinity level.

The CPU_SUSPEND emulation is not tested much because currently there
is no CPUIDLE driver in Linux kernel that uses PSCI CPU_SUSPEND. The
PSCI CPU_SUSPEND implementation in ARM64 kernel was tested using a
Simple CPUIDLE driver which is not published due to unstable DT-bindings
for PSCI.
(For more info, http://lwn.net/Articles/574950/)

Even if we had stable DT-bindings for PSCI and CPUIDLE driver that
uses PSCI CPU_SUSPEND then still we need to define SUSPEND states
for KVM ARM/ARM64. Due to this, the CPU_SUSPEND emulation added
by this patch only pause a VCPU and to wakeup a VCPU we need to
explicity call PSCI CPU_ON from Guest.

[...]

quoted

+
+static void psci_do_suspend(void *context)
+{
+     struct psci_suspend_info *sinfo = context;
+
+     sinfo->vcpu->arch.pause = true;
+     sinfo->vcpu->arch.suspend = true;
+     sinfo->vcpu->arch.suspend_entry = sinfo->saved_entry;
+     sinfo->vcpu->arch.suspend_context_id = sinfo->saved_context_id;

I don't really understand this, why are you not just setting pause =
true and modifying the registers as per the reentry rules in the spec?

Doesn't seem like this patch ever reads any of these values back?

Thats because we don't have any wake-up events defined for PSCI v0.2
emulated by KVM.

That doesn't make the code any more useful.  All you're doing which has
an effect here is setting pause = true.

If you're adding the other logic to create an infrastructure for some
later time where you plan to add some logic, then (1) I think you should
wait with introducing this infrastructure until you're going to use it,
and (2) it needs a big fat comment and an explanation of this in the
commit message.

quoted

+}
+
+static unsigned long kvm_psci_vcpu_suspend(struct kvm_vcpu *vcpu)
+{
+     int i;
+     unsigned long mpidr;
+     unsigned long target_affinity;
+     unsigned long target_affinity_mask;
+     unsigned long lowest_affinity_level;
+     struct kvm *kvm = vcpu->kvm;
+     struct kvm_vcpu *tmp;
+     struct psci_suspend_info sinfo;
+
+     target_affinity = kvm_vcpu_get_mpidr(vcpu);
+     lowest_affinity_level = (*vcpu_reg(vcpu, 1) >> 24) & 0x3;
+
+     /* Determine target affinity mask */
+     target_affinity_mask = MPIDR_HWID_BITMASK;
+     switch (lowest_affinity_level) {
+     case 0: /* All affinity levels are valid */
+             target_affinity_mask &= ~0x0UL;
+             break;
+     case 1: /* Aff0 ignored */
+             target_affinity_mask &= ~0xFFUL;
+             break;
+     case 2: /* Aff0 and Aff1 ignored */
+             target_affinity_mask &= ~0xFFFFUL;
+             break;
+     case 3: /* Aff0, Aff1, and Aff2 ignored */
+             target_affinity_mask &= ~0xFFFFFFUL;
+             break;
+     default:
+             return KVM_PSCI_RET_INVAL;
+     };

I feel like I've read this code before, can you factor it out?

OK, I will factor-out this portion since it can be shared
with AFFINTIY_INFO emulation.

quoted

+
+     /* Ignore other bits of target affinity */
+     target_affinity &= target_affinity_mask;
+
+     /* Prepare suspend info */
+     sinfo.vcpu = NULL;
+     sinfo.saved_entry = *vcpu_reg(vcpu, 2);
+     sinfo.saved_context_id = *vcpu_reg(vcpu, 3);
+
+     /* Suspend all VCPUs within target affinity */
+     kvm_for_each_vcpu(i, tmp, kvm) {
+             mpidr = kvm_vcpu_get_mpidr(tmp);
+             if (((mpidr & target_affinity_mask) == target_affinity) &&
+                 !tmp->arch.suspend) {
+                     sinfo.vcpu = tmp;
+                     smp_call_function_single(tmp->cpu,
+                                              psci_do_suspend, &sinfo, 1);

Hmmm, are you sure this is correct?  How does that correspond to the
PSCI docs saying

"It is only possible to call CPU_SUSPEND from the current core. That is,
it is not possible to request suspension of another core."

I would think this means, if all other cores in the specified affinity
level are already suspended, allow suspending the entire
cluster/group/..., but I may be wrong.

Actually, CPU_SUSPEND is for all cores belonging to affinity
of current core.

I don't think so, see Mark's response.  I think the note I quoted above
about it not being possible to request suspend of other cores makes it
clear that this is not the intended behavior.

quoted

My comments above notwithstanding, this also feels racy.  What happens
if two virtual cores within the same affinity level calls CPU_SUSPEND at
the same time?

Yes, I know its racy. I was expecting this comment.

uh, ok :)  Maybe that would make this an RFC patch and known race
conditions should be pointed out at least in the commit message.

What would be appropriate lock to protect per-VCPU suspend context?

I think spinlock would be better because psci_do_suspend() is called
using SMP IPIs.

Since we are not keeping any live state for anything else than each
vcpu, this should all just be local operations and you don't need
locking at all.

If needed, a per-VM spin-lock for psci state seems appropriate, but I
didn't think carefully about this.

quoted

Also, there doesn't seem to be any handling of the StateType requested
by the caller, the reentry behavior is very different depending on the
state you enter, unless you always treat the request as a suspend
(clause 3 under Section 5.4.2 in the PSCI spec), in that case that
deserves a comment.

The StateType is completely implementation dependent. Also, it is the
StateType that will help determine the wake-up event.

How do you arrive at this conclusion?

The StateID is completely platform-specific.

The StateType is referenced throughout the document, and the reentry
from standby vs. power-down is very different (return to caller vs.
reentry to different entry point address).

The only exception I can find in the spec is that power-down may not
succeed and the firmware may just do standby instead, if this is what
we're doing, this needs to be very clearly commented.

For KVM, we really don't have any StateType defined hence we don't
have any wake-up events defined for KVM PSCI.

StateType is defined in the PSCI spec, see above.

Should we have KVM specific suspend states?
What would KVM suspend states look like because suspend states
are platform specific?

Do you mean StateID here?  I don't think we need anything for KVM.

[...]

Thanks,
-Christoffer

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help