Thread (23 messages) 23 messages, 5 authors, 2020-03-05

Re: [RFC PATCH v1] powerpc/prom_init: disable XIVE in Secure VM.

From: Cédric Le Goater <clg@kaod.org>
Date: 2020-03-02 15:53:24

On 2/29/20 11:51 PM, Ram Pai wrote:
On Sat, Feb 29, 2020 at 09:27:54AM +0100, Cédric Le Goater wrote:
quoted
On 2/29/20 8:54 AM, Ram Pai wrote:
quoted
XIVE is not correctly enabled for Secure VM in the KVM Hypervisor yet.

Hence Secure VM, must always default to XICS interrupt controller.
have you tried XIVE emulation 'kernel-irqchip=off' ? 
yes and it hangs. I think that option, continues to enable some variant
of XIVE in the VM. 
HW is not involved, KVM is not involved anymore and all is emulated at 
the QEMU level in user space. What is the issue ? 
There are some known deficiencies between KVM
and the ultravisor negotiation, resulting in a hang in the SVM.
That is something else to investigate. feature/capability negotiation
is the core of the hypervisor stack : 

    OPAL <-> PowerNV <-> KVM <-> QEMU <-> guest OS
quoted
quoted
If XIVE is requested through kernel command line option "xive=on",
override and turn it off.
This is incorrect. It is negotiated through CAS depending on the FW
capabilities and the KVM capabilities.
Yes I understand, qemu/KVM have predetermined a set of capabilties that
it can offer to the VM.  The kernel within the VM has a list of
capabilties it needs to operate correctly.  So both negotiate and
determine something mutually ammicable.

Here I am talking about the list of capabilities that the kernel is
trying to determine, it needs to operate correctly.  "xive=on" is one of
those capabilities the kernel is told by the VM-adminstrator, to enable.
XIVE is not a kernel capability. It's platform support and the default
for P9 is the native exploitation mode which makes full use of the P9
interrupt controller. For non XIVE aware kernels, the hypervisor emulates
the legacy interface on top of XIVE. 

"xive=off" was introduced for distro testing. It skips the negotiation 
process of the XIVE native exploitation mode on the guest. But it's not
a negotiation setting. It's a chicken switch.
Unfortunately if the VM-administrtor blindly requests to enable it, the
kernel must override it, if it knows that will be switching the VM into
a SVM soon. No point negotiating a capability with Qemu; through CAS,
if it knows it cannot handle that capability.
I don't understand. Are you talking about SVM or XIVE ? 
quoted
quoted
If XIVE is the only supported platform interrupt controller; specified
through qemu option "ic-mode=xive", simply abort. Otherwise default to
XICS.

I don't think it is a good approach to downgrade the guest kernel 
capabilities this way. 

PAPR has specified the CAS negotiation process for this purpose. It 
comes in two parts under KVM. First the KVM hypervisor advertises or 
not a capability to QEMU. The second is the CAS negotiation process 
between QEMU and the guest OS.
Unfortunately, this is not viable.  At the time the hypervisor
advertises its capabilities to qemu, the hypervisor has no idea whether
that VM will switch into a SVM or not. 
OK, but the hypervisor knows if it can handle 'SVM' guests or not and,
if not, there is no point in advertising a 'SVM' capability to the guest. 
The decision to switch into a> SVM is taken by the kernel running in the VM. This happens much later,
after the hypervisor has already conveyed its capabilties to the qemu, and
qemu has than instantiated the VM.
So you don't have negotiation with the hypervisor ? How does the guest
knows the hypervisor platform can handle SVMs ? try and see if it fails ?
If so, it seems quite broken to me.
 
As a result, CAS in prom_init is the only place where this negotiation
can take place.
Euh. I don't follow. This is indeed where CAS is performed and so it's 
*the* place to check that the hypervisor has 'SVM' support ? 
quoted
The SVM specifications might not be complete yet and if some features 
are incompatible, I think we should modify the capabilities advertised 
by the hypervisor : no XIVE in case of SVM. QEMU will automatically 
use the fallback path and emulate the XIVE device, same as setting 
'kernel-irqchip=off'. 
As mentioned above, this would be an excellent approach, if the
Hypervisor was aware of the VM's intent to switch into a SVM. Neither
the hypervisor knows, nor the qemu.  Only the kernel running within the
VM knows about it.

The hypervisor (KVM/QEMU) never knows what are the guest OS capabilities
or its intents. That is why there is a negotiation process. 

I would do :

 * OPAL FW advertises 'SVM' support to the Linux PowerNV (through DT) 
 * KVM advertises 'SVM' support to QEMU (extend KVM ioctls)
 * QEMU advertises 'SVM' support to guest OS (through CAS or DT) 
 * Guest OS should not try to use SVM it is not supported. 

If the passthrough of HW pages is not supported by Ultravisor, KVM 
should not advertised XIVE to QEMU which would then use fallback mode.

If emulated XIVE or XICS is not supported by SVM guests, then we have
a problem and we need to understand why ! :) 

And if XIVE is still a problem, then the guest could change the CAS 
request and remove XIVE when SVM is being set. I suppose that we have 
all this information before CAS. Do we ? 

It should be a runtime choice taking into account the full software 
stack rather than a compile choice at the bottom which would impact
all other options. This is not acceptable IMHO.

Cheers,

C.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help