RE: [PATCH v3 4/4] KVM: arm64: Clear active_vmids on vCPU schedule out
From: Shameerali Kolothum Thodi <hidden>
Date: 2021-08-06 12:24:45
Also in:
kvmarm, lkml
-----Original Message----- From: Shameerali Kolothum Thodi Sent: 03 August 2021 16:57 To: 'Will Deacon' <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org; kvmarm@lists.cs.columbia.edu; linux-kernel@vger.kernel.org; maz@kernel.org; catalin.marinas@arm.com; james.morse@arm.com; julien.thierry.kdev@gmail.com; suzuki.poulose@arm.com; jean-philippe@linaro.org; Alexandru.Elisei@arm.com; qperret@google.com; Linuxarm [off-list ref] Subject: RE: [PATCH v3 4/4] KVM: arm64: Clear active_vmids on vCPU schedule outquoted
-----Original Message----- From: Will Deacon [mailto:will@kernel.org] Sent: 03 August 2021 16:31 To: Shameerali Kolothum Thodi <redacted> Cc: linux-arm-kernel@lists.infradead.org; kvmarm@lists.cs.columbia.edu; linux-kernel@vger.kernel.org; maz@kernel.org; catalin.marinas@arm.com; james.morse@arm.com; julien.thierry.kdev@gmail.com; suzuki.poulose@arm.com; jean-philippe@linaro.org; Alexandru.Elisei@arm.com; qperret@google.com; Linuxarm [off-list ref] Subject: Re: [PATCH v3 4/4] KVM: arm64: Clear active_vmids on vCPU schedule out On Tue, Aug 03, 2021 at 12:55:25PM +0000, Shameerali Kolothum Thodi wrote:quoted
quoted
quoted
diff --git a/arch/arm64/kvm/vmid.c b/arch/arm64/kvm/vmid.c index 5584e84aed95..5fd51f5445c1 100644 --- a/arch/arm64/kvm/vmid.c +++ b/arch/arm64/kvm/vmid.c@@ -116,6 +116,12 @@ static u64 new_vmid(struct kvm_vmid*kvm_vmid)quoted
return idx2vmid(vmid) | generation; } +/* Call with preemption disabled */ +void kvm_arm_vmid_clear_active(void) +{ + atomic64_set(this_cpu_ptr(&active_vmids), 0); +}I think this is very broken, as it will force everybody to take the slow-path when they see an active_vmid of 0.Yes. I have seen that happening in my test setup.Why didn't you say so?!Sorry. I thought of getting some performance numbers with and without this patch and measure the impact. But didn't quite get time to finish it yet.
These are some test numbers with and without this patch, run on two different test setups. a)Test Setup -1 ----------------------- Platform: HiSilicon D06 with 128 CPUs, VMID bits = 16 Run 128 VMs concurrently each with 2 vCPUs. Each Guest will execute hackbench 5 times before exiting. Measurements taken avg. of 10 Runs. Image : 5.14-rc3 --------------------------- Time(s) 44.43813888 No. of exits 145,348,264 Image: 5.14-rc3 + vmid-v3 ---------------------------------------- Time(s) 46.59789034 No. of exits 133,587,307 %diff against 5.14-rc3 Time: 4.8% more Exits: 8% less Image: 5.14-rc3 + vmid-v3 + Without active_asid clear --------------------------------------------------------------------------- Time(s) 44.5031782 No. of exits 144,443,188 %diff against 5.14-rc3 Time: 0.15% more Exits: 2.42% less b)Test Setup -2 ----------------------- Platform: HiSilicon D06 + Kernel with maxcpus set to 8 and VMID bits set to 4. Run 40 VMs concurrently each with 2 vCPUs. Each Guest will execute hackbench 5 times before exiting. Measurements taken avg. of 10 Runs. Image : 5.14-rc3-vmid4bit ------------------------------------ Time(s) 46.19963266 No. of exits 23,699,546 Image: 5.14-rc3-vmid4bit + vmid-v3 --------------------------------------------------- Time(s) 45.83307736 No. of exits 23,260,203 %diff against 5.14-rc3-vmid4bit Time: 0.8% less Exits: 1.85% less Image: 5.14-rc3-vmid4bit + vmid-v3 + Without active_asid clear ----------------------------------------------------------------------------------------- Time(s) 44.5031782 No. of exits 144,443,188 %diff against 5.14-rc3-vmid4bit Time: 1.05% less Exits: 2.06% less As expected, the active_asid clear on schedule out is not helping. But without this patch, the numbers seems to be better than the vanilla kernel when we force the setup(cpus=8, vmd=4bits) to perform rollover. Please let me know your thoughts. Thanks, Shameer
quoted
quoted
quoted
It also doesn't solve the issue I mentioned before, as an active_vmid of 0 means that the reserved vmid is preserved. Needs more thought...How about we clear all the active_vmids in kvm_arch_free_vm() if it matches the kvm_vmid->id ? But we may have to hold the lock thereI think we have to be really careful not to run into the "suspended animation" problem described in ae120d9edfe9 ("ARM: 7767/1: let the ASID allocator handle suspended animation") if we go down this road.Ok. I will go through that.quoted
Maybe something along the lines of: ROLLOVER * Take lock * Inc generation => This will force everybody down the slow path * Record active VMIDs * Broadcast TLBI => Only active VMIDs can be dirty => Reserve active VMIDs and mark as allocated VCPU SCHED IN * Set active VMID * Check generation * If mismatch then: * Take lock * Try to match a reserved VMID * If no reserved VMID, allocate new VCPU SCHED OUT * Clear active VMID but I'm not daft enough to think I got it right first time. I think it needs both implementing *and* modelling in TLA+ before we merge it!Ok. I need some time to digest the above first :). On another note, how serious do you think is the problem of extra reservation of the VMID space? Just wondering if we can skip this patch for now or not.. Thanks, Shameer
_______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel