Thread (16 messages) 16 messages, 2 authors, 2021-10-11

RE: [PATCH v3 4/4] KVM: arm64: Clear active_vmids on vCPU schedule out

From: Shameerali Kolothum Thodi <hidden>
Date: 2021-08-06 12:24:45
Also in: kvmarm, lkml

-----Original Message-----
From: Shameerali Kolothum Thodi
Sent: 03 August 2021 16:57
To: 'Will Deacon' <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org; kvmarm@lists.cs.columbia.edu;
linux-kernel@vger.kernel.org; maz@kernel.org; catalin.marinas@arm.com;
james.morse@arm.com; julien.thierry.kdev@gmail.com;
suzuki.poulose@arm.com; jean-philippe@linaro.org;
Alexandru.Elisei@arm.com; qperret@google.com; Linuxarm
[off-list ref]
Subject: RE: [PATCH v3 4/4] KVM: arm64: Clear active_vmids on vCPU
schedule out


quoted
-----Original Message-----
From: Will Deacon [mailto:will@kernel.org]
Sent: 03 August 2021 16:31
To: Shameerali Kolothum Thodi <redacted>
Cc: linux-arm-kernel@lists.infradead.org; kvmarm@lists.cs.columbia.edu;
linux-kernel@vger.kernel.org; maz@kernel.org; catalin.marinas@arm.com;
james.morse@arm.com; julien.thierry.kdev@gmail.com;
suzuki.poulose@arm.com; jean-philippe@linaro.org;
Alexandru.Elisei@arm.com; qperret@google.com; Linuxarm
[off-list ref]
Subject: Re: [PATCH v3 4/4] KVM: arm64: Clear active_vmids on vCPU
schedule out

On Tue, Aug 03, 2021 at 12:55:25PM +0000, Shameerali Kolothum Thodi
wrote:
quoted
quoted
quoted
diff --git a/arch/arm64/kvm/vmid.c b/arch/arm64/kvm/vmid.c
index 5584e84aed95..5fd51f5445c1 100644
--- a/arch/arm64/kvm/vmid.c
+++ b/arch/arm64/kvm/vmid.c
@@ -116,6 +116,12 @@ static u64 new_vmid(struct kvm_vmid
*kvm_vmid)
quoted
 	return idx2vmid(vmid) | generation;
 }

+/* Call with preemption disabled */
+void kvm_arm_vmid_clear_active(void)
+{
+	atomic64_set(this_cpu_ptr(&active_vmids), 0);
+}
I think this is very broken, as it will force everybody to take the
slow-path when they see an active_vmid of 0.
Yes. I have seen that happening in my test setup.
Why didn't you say so?!
Sorry. I thought of getting some performance numbers with and
without this patch and measure the impact. But didn't quite get time
to finish it yet.
These are some test numbers with and without this patch, run on two
different test setups.


a)Test Setup -1
-----------------------

Platform: HiSilicon D06 with 128 CPUs, VMID bits = 16
Run 128 VMs concurrently each with 2 vCPUs. Each Guest will execute hackbench
5 times before exiting.

Measurements taken avg. of 10 Runs.

Image : 5.14-rc3
---------------------------
  Time(s)       44.43813888
  No. of exits    145,348,264

Image: 5.14-rc3 + vmid-v3
----------------------------------------
  Time(s)        46.59789034
  No. of exits     133,587,307

%diff against 5.14-rc3
  Time: 4.8% more
  Exits: 8% less 

Image: 5.14-rc3 + vmid-v3 + Without active_asid clear
---------------------------------------------------------------------------
  Time(s)         44.5031782
  No. of exits      144,443,188

%diff against 5.14-rc3
  Time: 0.15% more
  Exits: 2.42% less

b)Test Setup -2
-----------------------

Platform: HiSilicon D06 + Kernel with maxcpus set to 8 and VMID bits set to 4.
Run 40 VMs concurrently each with 2 vCPUs. Each Guest will execute hackbench
5 times before exiting.

Measurements taken avg. of 10 Runs.

Image : 5.14-rc3-vmid4bit
------------------------------------
  Time(s)        46.19963266
  No. of exits     23,699,546

Image: 5.14-rc3-vmid4bit + vmid-v3
---------------------------------------------------
  Time(s)          45.83307736
  No. of exits      23,260,203

%diff against 5.14-rc3-vmid4bit
  Time: 0.8% less
  Exits: 1.85% less 

Image: 5.14-rc3-vmid4bit + vmid-v3 + Without active_asid clear
-----------------------------------------------------------------------------------------
  Time(s)           44.5031782
  No. of exits        144,443,188

%diff against 5.14-rc3-vmid4bit
  Time: 1.05% less
  Exits: 2.06% less

As expected, the active_asid clear on schedule out is not helping.
But without this patch, the numbers seems to be better than the
vanilla kernel when we force the setup(cpus=8, vmd=4bits)
to perform rollover.

Please let me know your thoughts.

Thanks,
Shameer
quoted
quoted
quoted
It also doesn't solve the issue I mentioned before, as an active_vmid of 0
means that the reserved vmid is preserved.

Needs more thought...
How about we clear all the active_vmids in kvm_arch_free_vm() if it
matches the kvm_vmid->id ? But we may have to hold the lock
there
I think we have to be really careful not to run into the "suspended
animation" problem described in ae120d9edfe9 ("ARM: 7767/1: let the ASID
allocator handle suspended animation") if we go down this road.

Ok. I will go through that.
quoted
Maybe something along the lines of:

ROLLOVER

  * Take lock
  * Inc generation
    => This will force everybody down the slow path
  * Record active VMIDs
  * Broadcast TLBI
    => Only active VMIDs can be dirty
    => Reserve active VMIDs and mark as allocated

VCPU SCHED IN

  * Set active VMID
  * Check generation
  * If mismatch then:
        * Take lock
        * Try to match a reserved VMID
        * If no reserved VMID, allocate new

VCPU SCHED OUT

  * Clear active VMID

but I'm not daft enough to think I got it right first time. I think it
needs both implementing *and* modelling in TLA+ before we merge it!
Ok. I need some time to digest the above first :).

On another note, how serious do you think is the problem of extra
reservation of the VMID space? Just wondering if we can skip this
patch for now or not..

Thanks,
Shameer
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help