Thread (14 messages) 14 messages, 5 authors, 2019-07-16

Re: [RFC PATCH v2 0/3] Support CPU hotplug for ARM64

From: James Morse <james.morse@arm.com>
Date: 2019-07-15 13:43:23
Also in: kvmarm, linux-acpi, lkml

Hi Maran,

On 10/07/2019 17:05, Maran Wilson wrote:
On 7/10/2019 2:15 AM, Marc Zyngier wrote:
quoted
On 09/07/2019 20:06, Maran Wilson wrote:
quoted
On 7/5/2019 3:12 AM, James Morse wrote:
quoted
On 29/06/2019 03:42, Xiongfeng Wang wrote:
quoted
This patchset mark all the GICC node in MADT as possible CPUs even though it
is disabled. But only those enabled GICC node are marked as present CPUs.
So that kernel will initialize some CPU related data structure in advance before
the CPU is actually hot added into the system. This patchset also implement
'acpi_(un)map_cpu()' and 'arch_(un)register_cpu()' for ARM64. These functions are
needed to enable CPU hotplug.

To support CPU hotplug, we need to add all the possible GICC node in MADT
including those CPUs that are not present but may be hot added later. Those
CPUs are marked as disabled in GICC nodes.
... what do you need this for?

(The term cpu-hotplug in the arm world almost never means hot-adding a new package/die to
the platform, we usually mean taking CPUs online/offline for power management. e.g.
cpuhp_offline_cpu_device())

It looks like you're adding support for hot-adding a new package/die to the platform ...
but only for virtualisation.

I don't see why this is needed for virtualisation. The in-kernel irqchip needs to know
these vcpu exist before you can enter the guest for the first time. You can't create them
late. At best you're saving the host scheduling a vcpu that is offline. Is this really a
problem?

If we moved PSCI support to user-space, you could avoid creating host vcpu threads until
the guest brings the vcpu online, which would solve that problem, and save the host
resources for the thread too. (and its acpi/dt agnostic)

I don't see the difference here between booting the guest with 'maxcpus=1', and bringing
the vcpu online later. The only real difference seems to be moving the can-be-online
policy into the hypervisor/VMM...
quoted
quoted
Isn't that an important distinction from a cloud service provider's
perspective?
Host cpu-time is. Describing this as guest vcpu's is a bit weird.

I'd expect the statement be something like "you're paying for 50% of one Xeon v-whatever".
It shouldn't make a difference if I run 8 vcpus or 2, the amount of cpu-time would still
be constrained by the cloud provider.

quoted
quoted
As far as I understand it, you also need CPU hotplug capabilities to
support things like Kata runtime under Kubernetes. i.e. when
implementing your containers in the form of light weight VMs for the
additional security ... and the orchestration layer cannot determine
ahead of time how much CPU/memory resources are going to be needed to
run the pod(s).
quoted
Why would it be any different? You can pre-allocate your vcpus, leave
them parked until some external agent decides to signal the container
that it it can use another bunch of CPUs. At that point, the container
must actively boot these vcpus (they aren't going to come up by magic).

Given that you must have sized your virtual platform to deal with the
maximum set of resources you anticipate (think of the GIC
redistributors, for example), I really wonder what you gain here.
Maybe I'm not following the alternative proposal completely, but wouldn't a guest VM (who
happens to be in control of its OS) be able to add/online vCPU resources without approval
from the VMM this way?
The in-kernel PSCI implementation will allow all CPUs to be online/offline. If we moved
that support to the VMM, it could apply some policy as to whether a cpu-online call
succeeds or fails.


Thanks,

James

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help