Re: [PATCH RFCv2 7/9] kvm/arm64: Support async page fault
From: Marc Zyngier <maz@kernel.org>
Date: 2020-05-27 07:37:11
Also in:
kvmarm, lkml
On 2020-05-27 05:05, Gavin Shan wrote:
Hi Mark,
[...]
quoted
quoted
+struct kvm_vcpu_pv_apf_data { + __u32 reason; + __u8 pad[60]; + __u32 enabled; +};What's all the padding for?The padding is ensure the @reason and @enabled in different cache line. @reason is shared by host/guest, while @enabled is almostly owned by guest.
So you are assuming that a cache line is at most 64 bytes. It is actualy implementation defined, and you can probe for it by looking at the CTR_EL0 register. There are implementations ranging from 32 to 256 bytes in the wild, and let's not mention broken big-little implementations here. [...]
quoted
quoted
+bool kvm_arch_can_inject_async_page_not_present(struct kvm_vcpu *vcpu) +{ + u64 vbar, pc; + u32 val; + int ret; + + if (!(vcpu->arch.apf.control_block & KVM_ASYNC_PF_ENABLED)) + return false; + + if (vcpu->arch.apf.send_user_only && vcpu_mode_priv(vcpu)) + return false; + + /* Pending page fault, which ins't acknowledged by guest */ + ret = kvm_async_pf_read_cache(vcpu, &val); + if (ret || val) + return false; + + /* + * Events can't be injected through data abort because it's + * going to clobber ELR_EL1, which might not consued (or saved) + * by guest yet. + */ + vbar = vcpu_read_sys_reg(vcpu, VBAR_EL1); + pc = *vcpu_pc(vcpu); + if (pc >= vbar && pc < (vbar + vcpu->arch.apf.no_fault_inst_range)) + return false;Ah, so that's when this `no_fault_inst_range` is for. As-is this is not sufficient, and we'll need t be extremely careful here. The vectors themselves typically only have a small amount of stub code, and the bulk of the non-reentrant exception entry work happens elsewhere, in a mixture of assembly and C code that isn't even virtually contiguous with either the vectors or itself. It's possible in theory that code in modules (or perhaps in eBPF JIT'd code) that isn't safe to take a fault from, so even having a contiguous range controlled by the kernel isn't ideal. How does this work on x86?Yeah, here we just provide a mechanism to forbid injecting data abort. The range is fed by guest through HVC call. So I think it's guest related issue. You had more comments about this in PATCH[9]. I will explain a bit more there. x86 basically relies on EFLAGS[IF] flag. The async page fault can be injected if it's on. Otherwise, it's forbidden. It's workable because exception is special interrupt to x86 if I'm correct. return (vmcs_readl(GUEST_RFLAGS) & X86_EFLAGS_IF) && !(vmcs_read32(GUEST_INTERRUPTIBILITY_INFO) & (GUEST_INTR_STATE_STI | GUEST_INTR_STATE_MOV_SS));
I really wish this was relying on an architected exception delivery
mechanism that can be blocked by the guest itself (PSTATE.{I,F,A}).
Trying to guess based on the PC won't fly. But these signals are
pretty hard to multiplex with anything else. Like any form of
non-architected exception injection, I don't see a good path forward
unless we start considering something like SDEI.
M.
--
Jazz is not dead. It just smells funny...
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel