[PATCH v3 00/41] Optimize KVM/ARM for VHE systems

[PATCH v3 00/41] Optimize KVM/ARM for VHE systems · Christoffer Dall <hidden> · 2018-01-12
[PATCH v3 01/41] KVM: arm/arm64: Avoid vcpu_load for other vcpu ioctls than KVM_RUN · Christoffer Dall <hidden> · 2018-01-12
[PATCH v3 01/41] KVM: arm/arm64: Avoid vcpu_load for other vcpu ioctls than KVM_RUN · Julien Grall <hidden> · 2018-02-05
[PATCH v3 02/41] KVM: arm/arm64: Move vcpu_load call after kvm_vcpu_first_run_init · Christoffer Dall <hidden> · 2018-01-12
[PATCH v3 02/41] KVM: arm/arm64: Move vcpu_load call after kvm_vcpu_first_run_init · Julien Grall <hidden> · 2018-02-05
[PATCH v3 03/41] KVM: arm64: Avoid storing the vcpu pointer on the stack · Christoffer Dall <hidden> · 2018-01-12
[PATCH v3 03/41] KVM: arm64: Avoid storing the vcpu pointer on the stack · Julien Grall <hidden> · 2018-02-05
[PATCH v3 04/41] KVM: arm64: Rework hyp_panic for VHE and non-VHE · Christoffer Dall <hidden> · 2018-01-12
[PATCH v3 04/41] KVM: arm64: Rework hyp_panic for VHE and non-VHE · Julien Grall <hidden> · 2018-02-05
[PATCH v3 04/41] KVM: arm64: Rework hyp_panic for VHE and non-VHE · Julien Grall <hidden> · 2018-02-05
[PATCH v3 04/41] KVM: arm64: Rework hyp_panic for VHE and non-VHE · Christoffer Dall <hidden> · 2018-02-08
[PATCH v3 04/41] KVM: arm64: Rework hyp_panic for VHE and non-VHE · Julien Grall <hidden> · 2018-02-09
[PATCH v3 05/41] KVM: arm64: Move HCR_INT_OVERRIDE to default HCR_EL2 guest flag · Christoffer Dall <hidden> · 2018-01-12
[PATCH v3 05/41] KVM: arm64: Move HCR_INT_OVERRIDE to default HCR_EL2 guest flag · Julien Grall <hidden> · 2018-02-09
[PATCH v3 05/41] KVM: arm64: Move HCR_INT_OVERRIDE to default HCR_EL2 guest flag · Christoffer Dall <hidden> · 2018-02-13
[PATCH v3 06/41] KVM: arm/arm64: Get rid of vcpu->arch.irq_lines · Christoffer Dall <hidden> · 2018-01-12
[PATCH v3 07/41] KVM: arm/arm64: Add kvm_vcpu_load_sysregs and kvm_vcpu_put_sysregs · Christoffer Dall <hidden> · 2018-01-12
[PATCH v3 08/41] KVM: arm/arm64: Introduce vcpu_el1_is_32bit · Christoffer Dall <hidden> · 2018-01-12
[PATCH v3 08/41] KVM: arm/arm64: Introduce vcpu_el1_is_32bit · Julien Thierry <hidden> · 2018-01-17
[PATCH v3 08/41] KVM: arm/arm64: Introduce vcpu_el1_is_32bit · Christoffer Dall <hidden> · 2018-01-18
[PATCH v3 08/41] KVM: arm/arm64: Introduce vcpu_el1_is_32bit · Julien Grall <hidden> · 2018-02-09
[PATCH v3 09/41] KVM: arm64: Defer restoring host VFP state to vcpu_put · Christoffer Dall <hidden> · 2018-01-12
[PATCH v3 09/41] KVM: arm64: Defer restoring host VFP state to vcpu_put · Dave.Martin@arm.com (Dave Martin) · 2018-01-22
[PATCH v3 09/41] KVM: arm64: Defer restoring host VFP state to vcpu_put · Christoffer Dall <hidden> · 2018-01-25
[PATCH v3 09/41] KVM: arm64: Defer restoring host VFP state to vcpu_put · Dave.Martin@arm.com (Dave Martin) · 2018-02-07
[PATCH v3 09/41] KVM: arm64: Defer restoring host VFP state to vcpu_put · Christoffer Dall <hidden> · 2018-02-07
[PATCH v3 09/41] KVM: arm64: Defer restoring host VFP state to vcpu_put · Dave.Martin@arm.com (Dave Martin) · 2018-02-09
[PATCH v3 09/41] KVM: arm64: Defer restoring host VFP state to vcpu_put · Christoffer Dall <hidden> · 2018-02-13
[PATCH v3 09/41] KVM: arm64: Defer restoring host VFP state to vcpu_put · Dave.Martin@arm.com (Dave Martin) · 2018-02-13
[PATCH v3 09/41] KVM: arm64: Defer restoring host VFP state to vcpu_put · Christoffer Dall <hidden> · 2018-02-14
[PATCH v3 09/41] KVM: arm64: Defer restoring host VFP state to vcpu_put · Dave.Martin@arm.com (Dave Martin) · 2018-02-14
[PATCH v3 09/41] KVM: arm64: Defer restoring host VFP state to vcpu_put · Christoffer Dall <hidden> · 2018-02-14
[PATCH v3 09/41] KVM: arm64: Defer restoring host VFP state to vcpu_put · Ard Biesheuvel <hidden> · 2018-02-14
[PATCH v3 09/41] KVM: arm64: Defer restoring host VFP state to vcpu_put · Marc Zyngier <hidden> · 2018-02-14
[PATCH v3 09/41] KVM: arm64: Defer restoring host VFP state to vcpu_put · Dave.Martin@arm.com (Dave Martin) · 2018-02-15
[PATCH v3 09/41] KVM: arm64: Defer restoring host VFP state to vcpu_put · Julien Grall <hidden> · 2018-02-09
[PATCH v3 09/41] KVM: arm64: Defer restoring host VFP state to vcpu_put · Christoffer Dall <hidden> · 2018-02-13
[PATCH v3 10/41] KVM: arm64: Move debug dirty flag calculation out of world switch · Christoffer Dall <hidden> · 2018-01-12
[PATCH v3 10/41] KVM: arm64: Move debug dirty flag calculation out of world switch · Julien Thierry <hidden> · 2018-01-17
[PATCH v3 11/41] KVM: arm64: Slightly improve debug save/restore functions · Christoffer Dall <hidden> · 2018-01-12
[PATCH v3 12/41] KVM: arm64: Improve debug register save/restore flow · Christoffer Dall <hidden> · 2018-01-12
[PATCH v3 13/41] KVM: arm64: Factor out fault info population and gic workarounds · Christoffer Dall <hidden> · 2018-01-12
[PATCH v3 13/41] KVM: arm64: Factor out fault info population and gic workarounds · Julien Thierry <hidden> · 2018-01-17
[PATCH v3 14/41] KVM: arm64: Introduce VHE-specific kvm_vcpu_run · Christoffer Dall <hidden> · 2018-01-12
[PATCH v3 14/41] KVM: arm64: Introduce VHE-specific kvm_vcpu_run · Dave.Martin@arm.com (Dave Martin) · 2018-01-24
[PATCH v3 14/41] KVM: arm64: Introduce VHE-specific kvm_vcpu_run · Christoffer Dall <hidden> · 2018-01-25
[PATCH v3 14/41] KVM: arm64: Introduce VHE-specific kvm_vcpu_run · Julien Grall <hidden> · 2018-02-09
[PATCH v3 14/41] KVM: arm64: Introduce VHE-specific kvm_vcpu_run · Christoffer Dall <hidden> · 2018-02-13
[PATCH v3 15/41] KVM: arm64: Remove kern_hyp_va() use in VHE switch function · Christoffer Dall <hidden> · 2018-01-12
[PATCH v3 15/41] KVM: arm64: Remove kern_hyp_va() use in VHE switch function · Dave.Martin@arm.com (Dave Martin) · 2018-01-24
[PATCH v3 15/41] KVM: arm64: Remove kern_hyp_va() use in VHE switch function · Christoffer Dall <hidden> · 2018-01-25
[PATCH v3 16/41] KVM: arm64: Don't deactivate VM on VHE systems · Christoffer Dall <hidden> · 2018-01-12
[PATCH v3 17/41] KVM: arm64: Remove noop calls to timer save/restore from VHE switch · Christoffer Dall <hidden> · 2018-01-12
[PATCH v3 17/41] KVM: arm64: Remove noop calls to timer save/restore from VHE switch · Julien Grall <hidden> · 2018-02-09
[PATCH v3 17/41] KVM: arm64: Remove noop calls to timer save/restore from VHE switch · Christoffer Dall <hidden> · 2018-02-13
[PATCH v3 17/41] KVM: arm64: Remove noop calls to timer save/restore from VHE switch · Christoffer Dall <hidden> · 2018-02-13
[PATCH v3 17/41] KVM: arm64: Remove noop calls to timer save/restore from VHE switch · Julien Grall <hidden> · 2018-02-19
[PATCH v3 18/41] KVM: arm64: Move userspace system registers into separate function · Christoffer Dall <hidden> · 2018-01-12
[PATCH v3 18/41] KVM: arm64: Move userspace system registers into separate function · Julien Grall <hidden> · 2018-02-09
[PATCH v3 18/41] KVM: arm64: Move userspace system registers into separate function · Christoffer Dall <hidden> · 2018-02-14
[PATCH v3 19/41] KVM: arm64: Rewrite sysreg alternatives to static keys · Christoffer Dall <hidden> · 2018-01-12
[PATCH v3 20/41] KVM: arm64: Introduce separate VHE/non-VHE sysreg save/restore functions · Christoffer Dall <hidden> · 2018-01-12
[PATCH v3 21/41] KVM: arm/arm64: Remove leftover comment from kvm_vcpu_run_vhe · Christoffer Dall <hidden> · 2018-01-12
[PATCH v3 22/41] KVM: arm64: Unify non-VHE host/guest sysreg save and restore functions · Christoffer Dall <hidden> · 2018-01-12
[PATCH v3 23/41] KVM: arm64: Don't save the host ELR_EL2 and SPSR_EL2 on VHE systems · Christoffer Dall <hidden> · 2018-01-12
[PATCH v3 24/41] KVM: arm64: Change 32-bit handling of VM system registers · Christoffer Dall <hidden> · 2018-01-12
[PATCH v3 25/41] KVM: arm64: Rewrite system register accessors to read/write functions · Christoffer Dall <hidden> · 2018-01-12
[PATCH v3 26/41] KVM: arm64: Introduce framework for accessing deferred sysregs · Christoffer Dall <hidden> · 2018-01-12
[PATCH v3 26/41] KVM: arm64: Introduce framework for accessing deferred sysregs · Julien Thierry <hidden> · 2018-01-17
[PATCH v3 26/41] KVM: arm64: Introduce framework for accessing deferred sysregs · Christoffer Dall <hidden> · 2018-01-18
[PATCH v3 26/41] KVM: arm64: Introduce framework for accessing deferred sysregs · Julien Thierry <hidden> · 2018-01-18
[PATCH v3 26/41] KVM: arm64: Introduce framework for accessing deferred sysregs · Dave.Martin@arm.com (Dave Martin) · 2018-01-23
[PATCH v3 26/41] KVM: arm64: Introduce framework for accessing deferred sysregs · Christoffer Dall <hidden> · 2018-01-25
[PATCH v3 26/41] KVM: arm64: Introduce framework for accessing deferred sysregs · Dave.Martin@arm.com (Dave Martin) · 2018-02-09
[PATCH v3 26/41] KVM: arm64: Introduce framework for accessing deferred sysregs · Christoffer Dall <hidden> · 2018-02-13
[PATCH v3 26/41] KVM: arm64: Introduce framework for accessing deferred sysregs · Dave.Martin@arm.com (Dave Martin) · 2018-02-13
[PATCH v3 27/41] KVM: arm/arm64: Prepare to handle deferred save/restore of SPSR_EL1 · Christoffer Dall <hidden> · 2018-01-12
[PATCH v3 28/41] KVM: arm64: Prepare to handle deferred save/restore of ELR_EL1 · Christoffer Dall <hidden> · 2018-01-12
[PATCH v3 29/41] KVM: arm64: Defer saving/restoring 64-bit sysregs to vcpu load/put on VHE · Christoffer Dall <hidden> · 2018-01-12
[PATCH v3 30/41] KVM: arm64: Prepare to handle deferred save/restore of 32-bit registers · Christoffer Dall <hidden> · 2018-01-12
[PATCH v3 30/41] KVM: arm64: Prepare to handle deferred save/restore of 32-bit registers · Julien Thierry <hidden> · 2018-01-17
[PATCH v3 30/41] KVM: arm64: Prepare to handle deferred save/restore of 32-bit registers · Christoffer Dall <hidden> · 2018-01-18
[PATCH v3 31/41] KVM: arm64: Defer saving/restoring 32-bit sysregs to vcpu load/put · Christoffer Dall <hidden> · 2018-01-12
[PATCH v3 32/41] KVM: arm64: Move common VHE/non-VHE trap config in separate functions · Christoffer Dall <hidden> · 2018-01-12
[PATCH v3 33/41] KVM: arm64: Configure FPSIMD traps on vcpu load/put · Christoffer Dall <hidden> · 2018-01-12
[PATCH v3 33/41] KVM: arm64: Configure FPSIMD traps on vcpu load/put · Julien Thierry <hidden> · 2018-01-18
[PATCH v3 33/41] KVM: arm64: Configure FPSIMD traps on vcpu load/put · Tomasz Nowicki <hidden> · 2018-01-31
[PATCH v3 33/41] KVM: arm64: Configure FPSIMD traps on vcpu load/put · Christoffer Dall <hidden> · 2018-02-05
[PATCH v3 33/41] KVM: arm64: Configure FPSIMD traps on vcpu load/put · Tomasz Nowicki <hidden> · 2018-01-31
[PATCH v3 34/41] KVM: arm64: Configure c15, PMU, and debug register traps on cpu load/put for VHE · Christoffer Dall <hidden> · 2018-01-12
[PATCH v3 35/41] KVM: arm64: Separate activate_traps and deactive_traps for VHE and non-VHE · Christoffer Dall <hidden> · 2018-01-12
[PATCH v3 36/41] KVM: arm/arm64: Get rid of vgic_elrsr · Christoffer Dall <hidden> · 2018-01-12
[PATCH v3 37/41] KVM: arm/arm64: Handle VGICv2 save/restore from the main VGIC code · Christoffer Dall <hidden> · 2018-01-12
[PATCH v3 38/41] KVM: arm/arm64: Move arm64-only vgic-v2-sr.c file to arm64 · Christoffer Dall <hidden> · 2018-01-12
[PATCH v3 39/41] KVM: arm/arm64: Handle VGICv3 save/restore from the main VGIC code on VHE · Christoffer Dall <hidden> · 2018-01-12
[PATCH v3 40/41] KVM: arm/arm64: Move VGIC APR save/restore to vgic put/load · Christoffer Dall <hidden> · 2018-01-12
[PATCH v3 41/41] KVM: arm/arm64: Avoid VGICv3 save/restore on VHE with no IRQs · Christoffer Dall <hidden> · 2018-01-12
[PATCH v3 41/41] KVM: arm/arm64: Avoid VGICv3 save/restore on VHE with no IRQs · Tomasz Nowicki <hidden> · 2018-02-05
[PATCH v3 41/41] KVM: arm/arm64: Avoid VGICv3 save/restore on VHE with no IRQs · Christoffer Dall <hidden> · 2018-02-08
[PATCH v3 00/41] Optimize KVM/ARM for VHE systems · Yury Norov <hidden> · 2018-01-15
[PATCH v3 00/41] Optimize KVM/ARM for VHE systems · Christoffer Dall <hidden> · 2018-01-15
[PATCH v3 00/41] Optimize KVM/ARM for VHE systems · Yury Norov <hidden> · 2018-01-17
[PATCH v3 00/41] Optimize KVM/ARM for VHE systems · Christoffer Dall <hidden> · 2018-01-17
[PATCH v3 00/41] Optimize KVM/ARM for VHE systems · Christoffer Dall <hidden> · 2018-01-18
[PATCH v3 00/41] Optimize KVM/ARM for VHE systems · Yury Norov <hidden> · 2018-01-18
[PATCH v3 00/41] Optimize KVM/ARM for VHE systems · Christoffer Dall <hidden> · 2018-01-18
[PATCH v3 00/41] Optimize KVM/ARM for VHE systems · Tomasz Nowicki <hidden> · 2018-01-22
[PATCH v3 00/41] Optimize KVM/ARM for VHE systems · Tomasz Nowicki <hidden> · 2018-02-01
[PATCH v3 00/41] Optimize KVM/ARM for VHE systems · Yury Norov <hidden> · 2018-02-01
[PATCH v3 00/41] Optimize KVM/ARM for VHE systems · Tomasz Nowicki <hidden> · 2018-02-02
[PATCH v3 00/41] Optimize KVM/ARM for VHE systems · Tomasz Nowicki <hidden> · 2018-02-02
[PATCH v3 00/41] Optimize KVM/ARM for VHE systems · Christoffer Dall <hidden> · 2018-02-08

STALE3024d

Revisions (10)

2017-10-12 v1 [diff vs current]
2017-12-07 v2 [diff vs current]
2017-12-11 v2 [diff vs current]
2018-01-12 v3 [diff vs current]
2018-01-15 v3 [diff vs current]
2018-01-17 v3 [diff vs current]
2018-01-18 v3 [diff vs current]
2018-01-18 v3 current
2018-02-08 v3 [diff vs current]
2018-02-15 v4 [diff vs current]

From: Christoffer Dall <hidden>
Date: 2018-01-18 13:32:24
Also in: kvm, kvmarm

On Thu, Jan 18, 2018 at 03:18:21PM +0300, Yury Norov wrote:

On Thu, Jan 18, 2018 at 12:16:32PM +0100, Christoffer Dall wrote:

quoted

Hi Yury,

[cc'ing Alex Bennee who had some thoughts on this]

On Mon, Jan 15, 2018 at 05:14:23PM +0300, Yury Norov wrote:

quoted

On Fri, Jan 12, 2018 at 01:07:06PM +0100, Christoffer Dall wrote:

quoted

This series redesigns parts of KVM/ARM to optimize the performance on
VHE systems.  The general approach is to try to do as little work as
possible when transitioning between the VM and the hypervisor.  This has
the benefit of lower latency when waiting for interrupts and delivering
virtual interrupts, and reduces the overhead of emulating behavior and
I/O in the host kernel.

Patches 01 through 06 are not VHE specific, but rework parts of KVM/ARM
that can be generally improved.  We then add infrastructure to move more
logic into vcpu_load and vcpu_put, we improve handling of VFP and debug
registers.

We then introduce a new world-switch function for VHE systems, which we
can tweak and optimize for VHE systems.  To do that, we rework a lot of
the system register save/restore handling and emulation code that may
need access to system registers, so that we can defer as many system
register save/restore operations to vcpu_load and vcpu_put, and move
this logic out of the VHE world switch function.

We then optimize the configuration of traps.  On non-VHE systems, both
the host and VM kernels run in EL1, but because the host kernel should
have full access to the underlying hardware, but the VM kernel should
not, we essentially make the host kernel more privileged than the VM
kernel despite them both running at the same privilege level by enabling
VE traps when entering the VM and disabling those traps when exiting the
VM.  On VHE systems, the host kernel runs in EL2 and has full access to
the hardware (as much as allowed by secure side software), and is
unaffected by the trap configuration.  That means we can configure the
traps for VMs running in EL1 once, and don't have to switch them on and
off for every entry/exit to/from the VM.

Finally, we improve our VGIC handling by moving all save/restore logic
out of the VHE world-switch, and we make it possible to truly only
evaluate if the AP list is empty and not do *any* VGIC work if that is
the case, and only do the minimal amount of work required in the course
of the VGIC processing when we have virtual interrupts in flight.

The patches are based on v4.15-rc3, v9 of the level-triggered mapped
interrupts support series [1], and the first five patches of James' SDEI
series [2].

I've given the patches a fair amount of testing on Thunder-X, Mustang,
Seattle, and TC2 (32-bit) for non-VHE testing, and tested VHE
functionality on the Foundation model, running both 64-bit VMs and
32-bit VMs side-by-side and using both GICv3-on-GICv3 and
GICv2-on-GICv3.

The patches are also available in the vhe-optimize-v3 branch on my
kernel.org repository [3].  The vhe-optimize-v3-base branch contains
prerequisites of this series.

Changes since v2:
 - Rebased on v4.15-rc3.
 - Includes two additional patches that only does vcpu_load after
   kvm_vcpu_first_run_init and only for KVM_RUN.
 - Addressed review comments from v2 (detailed changelogs are in the
   individual patches).

Thanks,
-Christoffer

[1]: git://git.kernel.org/pub/scm/linux/kernel/git/cdall/linux.git level-mapped-v9
[2]: git://linux-arm.org/linux-jm.git sdei/v5/base
[3]: git://git.kernel.org/pub/scm/linux/kernel/git/cdall/linux.git vhe-optimize-v3

I tested this v3 series on ThunderX2 with IPI benchmark:
https://lkml.org/lkml/2017/12/11/364

I tried to address your comments in discussion to v2, like pinning
the module to specific CPU (with taskset), increasing the number of
iterations, tuning governor to max performance. Results didn't change
much, and are pretty stable.

Comparing to vanilla guest, Norml IPI delivery for v3 is 20% slower.
For v2 it was 27% slower, and for v1 - 42% faster. What's interesting,
the acknowledge time is much faster for v3, so overall time to
deliver and acknowledge IPI (2nd column) is less than vanilla
4.15-rc3 kernel.

Test setup is not changed since v2: ThunderX2, 112 online CPUs,
guest is running under qemu-kvm, emulating gic version 3.

Below is test results for v1-3 normalized to host vanilla kernel
dry-run time.

Yury

Host, v4.14:
Dry-run:          0         1
Self-IPI:         9        18
Normal IPI:      81       110
Broadcast IPI:    0      2106

Guest, v4.14:
Dry-run:          0         1
Self-IPI:        10        18
Normal IPI:     305       525
Broadcast IPI:    0      9729

Guest, v4.14 + VHE:
Dry-run:          0         1
Self-IPI:         9        18
Normal IPI:     176       343
Broadcast IPI:    0      9885

And for v2.

Host, v4.15:                   
Dry-run:          0         1
Self-IPI:         9        18
Normal IPI:      79       108
Broadcast IPI:    0      2102
                        
Guest, v4.15-rc:
Dry-run:          0         1
Self-IPI:         9        18
Normal IPI:     291       526
Broadcast IPI:    0     10439

Guest, v4.15-rc + VHE:
Dry-run:          0         2
Self-IPI:        14        28
Normal IPI:     370       569
Broadcast IPI:    0     11688

And for v3.

Host 4.15-rc3					
Dry-run:	  0	    1
Self-IPI:	  9	   18
Normal IPI:	 80	  110
Broadcast IPI:	  0	 2088
		
Guest, 4.15-rc3	
Dry-run:	  0	    1
Self-IPI:	  9	   18
Normal IPI:	289	  497
Broadcast IPI:	  0	 9999
		
Guest, 4.15-rc3	+ VHE
Dry-run:	  0	    2
Self-IPI:	 12	   24
Normal IPI:	347	  490
Broadcast IPI:	  0	11906

So, I had a look at your measurement code, and just want to make a
sanity check that I understand the measurements correctly.

Firstly, if we execute something 100,000 times and summarize the result
for each run, and get anything less than 100,000 (in this case ~300),
without scaling the value, doesn't that mean that in the vast majority
of cases, you are getting 0 as your measurement?

I cannot report absolute numbers so I posted normalized values to dry-run
case. 300 for IPI delivery means that it 300 times slower than no-op
(dry-run case). Absolute numbers looks quite reasonable, few useconds
for normal IPI.

Ah, I see, you normalized it after the output from your benchmark.  I
thought you normalized it in the benchmark code originally, but then I
didn't see it in the patch you linked to, so wasn't sure what was going
on.

Let me know if you need absolute numbers.
https://lkml.org/lkml/2017/12/13/301

I trust you, that's fine.

quoted

Secondly, are we sure all the required memory barriers are in place?
I know that the IPI send contains an smp_wmb(), but when you read back
the value in the caller, do you have the necessary smp_wmb() on the
handler side and a corresponding smp_rmb() on the sending side?  I'm not
sure what kind of effect missing barriers for a measurement framework
like this would have, but it's worth making sure we're not chasing red
herrings here.

I don't share memory between PMUs.

PMUs?

You do share memory between your CPUs, it's the little piece of memory
that your time variable points to.

I was concerned if the read back from your sender CPU of the value
written by the receiving CPU was properly ordered, but looking at
handle_IPI and smp_call_function_single, there are barriers pretty much
all over, and I don't think a missing barrier would result in what we
see here (given that I understand the normalization above).

quoted

That obviously doesn't change that the overall turnaround time is
improved more in the v1 case than in the v3 case, which I'd like to
explore/bisect in any case.

So me. For any idea, let me know, I'll check it.

So another thing that would be very useful (which I would do myself if I
had access to a TX2) would be to simply bisect the series and run
the benchmark and see where the regression is introduced.

In case you have time for that, I have a bisectable series with the
recent KVM/ARM fixes in the 'vhe-optimize-v3-with-fixes' branch on:
git://git.kernel.org/pub/scm/linux/kernel/git/cdall/linux.git


Thanks,
-Christoffer

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help