Re: [PATCH 2/2] KVM: PPC: Book3E: Get vcpu's last instruction for emulation
From: Scott Wood <hidden>
Date: 2013-07-09 17:13:51
Also in:
kvm
On 07/08/2013 08:39:05 AM, Alexander Graf wrote:
=20 On 28.06.2013, at 11:20, Mihai Caraman wrote: =20quoted
lwepx faults needs to be handled by KVM and this implies additional =20codequoted
in DO_KVM macro to identify the source of the exception originated =20fromquoted
host context. This requires to check the Exception Syndrome Register (ESR[EPID]) and External PID Load Context Register (EPLC[EGS]) for =20DTB_MISS,quoted
DSI and LRAT exceptions which is too intrusive for the host. Get rid of lwepx and acquire last instuction in =20kvmppc_handle_exit() byquoted
searching for the physical address and kmap it. This fixes an =20infinite loop =20 What's the difference in speed for this? =20 Also, could we call lwepx later in host code, when =20 kvmppc_get_last_inst() gets invoked?
Any use of lwepx is problematic unless we want to add overhead to the =20 main Linux TLB miss handler.
quoted
+ return; + } + + mas3 =3D mfspr(SPRN_MAS3); + pr =3D vcpu->arch.shared->msr & MSR_PR; + if ((pr && (!(mas3 & MAS3_UX))) || ((!pr) && (!(mas3 & =20MAS3_SX)))) {quoted
+ /* + * Another thread may rewrite the TLB entry in =20parallel, don'tquoted
+ * execute from the address if the execute permission =20is not set =20 Isn't this racy?
Yes, that's the point. We want to access permissions atomically with =20 the address. If the guest races here, the unpredictable behavior is =20 its own fault, but we don't want to make it worse by assuming that the =20 new TLB entry is executable just because the old TLB entry was. There's still a potential problem if the instruction at the new TLB =20 entry is valid but not something that KVM emulates (because it wouldn't =20 have trapped). Given that the guest is already engaging in =20 unpredictable behavior, though, and that it's no longer a security =20 issue (it'll just cause the guest to exit), I don't think we need to =20 worry too much about it.
quoted
+ */ + vcpu->arch.fault_esr =3D 0; + *exit_nr =3D BOOKE_INTERRUPT_INST_STORAGE; + return; + } + + /* Get page size */ + if (MAS0_GET_TLBSEL(mfspr(SPRN_MAS0)) =3D=3D 0) + psize_shift =3D PAGE_SHIFT; + else + psize_shift =3D MAS1_GET_TSIZE(mas1) + 10; + + mas7_mas3 =3D (((u64) mfspr(SPRN_MAS7)) << 32) | + mfspr(SPRN_MAS3);=20 You're non-atomically reading MAS3/MAS7 after you've checked for =20 permissions on MAS3. I'm surprised there's no handler that allows =20 MAS3/7 access through the new, combined SPR for 64bit systems.
There is, but then we'd need to special-case 64-bit systems. Why does =20 atomicity matter here? The MAS registers were filled in when we did =20 the tlbsx. They are thread-local. They don't magically change just =20 because the other thread rewrites the TLB entry that was used to fill =20 them.
quoted
+ addr =3D (mas7_mas3 & (~0ULL << psize_shift)) | + (geaddr & ((1ULL << psize_shift) - 1ULL)); + + /* Map a page and get guest's instruction */ + page =3D pfn_to_page(addr >> PAGE_SHIFT);=20 So it seems to me like you're jumping through a lot of hoops to make =20 sure this works for LRAT and non-LRAT at the same time. Can't we just =20 treat them as the different things they are? =20 What if we have different MMU backends for LRAT and non-LRAT? The =20 non-LRAT case could then try lwepx, if that fails, fall back to read =20 the shadow TLB. For the LRAT case, we'd do lwepx, if that fails fall =20 back to this logic.
This isn't about LRAT; it's about hardware threads. It also fixes the =20 handling of execute-only pages on current chips. -Scott=