[PATCH v3 00/20] KVM: ARM64: Add guest PMU support

[PATCH v3 00/20] KVM: ARM64: Add guest PMU support · Shannon Zhao <hidden> · 2015-09-24
[PATCH v3 01/20] ARM64: Move PMU register related defines to asm/pmu.h · Shannon Zhao <hidden> · 2015-09-24
[PATCH v3 02/20] KVM: ARM64: Define PMU data structure for each vcpu · Shannon Zhao <hidden> · 2015-09-24
[PATCH v3 03/20] KVM: ARM64: Add offset defines for PMU registers · Shannon Zhao <hidden> · 2015-09-24
[PATCH v3 03/20] KVM: ARM64: Add offset defines for PMU registers · Marc Zyngier <hidden> · 2015-10-07
[PATCH v3 04/20] KVM: ARM64: Add reset and access handlers for PMCR_EL0 register · Shannon Zhao <hidden> · 2015-09-24
[PATCH v3 04/20] KVM: ARM64: Add reset and access handlers for PMCR_EL0 register · Wei Huang <hidden> · 2015-10-16
[PATCH v3 04/20] KVM: ARM64: Add reset and access handlers for PMCR_EL0 register · Shannon Zhao <hidden> · 2015-10-21
[PATCH v3 05/20] KVM: ARM64: Add reset and access handlers for PMSELR register · Shannon Zhao <hidden> · 2015-09-24
[PATCH v3 06/20] KVM: ARM64: Add reset and access handlers for PMCEID0 and PMCEID1 register · Shannon Zhao <hidden> · 2015-09-24
[PATCH v3 07/20] KVM: ARM64: PMU: Add perf event map and introduce perf event creating function · Shannon Zhao <hidden> · 2015-09-24
[PATCH v3 07/20] KVM: ARM64: PMU: Add perf event map and introduce perf event creating function · Wei Huang <hidden> · 2015-10-16
[PATCH v3 07/20] KVM: ARM64: PMU: Add perf event map and introduce perf event creating function · Shannon Zhao <hidden> · 2015-10-21
[PATCH v3 08/20] KVM: ARM64: Add reset and access handlers for PMXEVTYPER register · Shannon Zhao <hidden> · 2015-09-24
[PATCH v3 09/20] KVM: ARM64: Add reset and access handlers for PMXEVCNTR register · Shannon Zhao <hidden> · 2015-09-24
[PATCH v3 10/20] KVM: ARM64: Add reset and access handlers for PMCCNTR register · Shannon Zhao <hidden> · 2015-09-24
[PATCH v3 10/20] KVM: ARM64: Add reset and access handlers for PMCCNTR register · Wei Huang <hidden> · 2015-10-16
[PATCH v3 10/20] KVM: ARM64: Add reset and access handlers for PMCCNTR register · Shannon Zhao <hidden> · 2015-10-21
[PATCH v3 11/20] KVM: ARM64: Add reset and access handlers for PMCNTENSET and PMCNTENCLR register · Shannon Zhao <hidden> · 2015-09-24
[PATCH v3 12/20] KVM: ARM64: Add reset and access handlers for PMINTENSET and PMINTENCLR register · Shannon Zhao <hidden> · 2015-09-24
[PATCH v3 13/20] KVM: ARM64: Add reset and access handlers for PMOVSSET and PMOVSCLR register · Shannon Zhao <hidden> · 2015-09-24
[PATCH v3 14/20] KVM: ARM64: Add reset and access handlers for PMUSERENR register · Shannon Zhao <hidden> · 2015-09-24
[PATCH v3 15/20] KVM: ARM64: Add reset and access handlers for PMSWINC register · Shannon Zhao <hidden> · 2015-09-24
[PATCH v3 15/20] KVM: ARM64: Add reset and access handlers for PMSWINC register · Wei Huang <hidden> · 2015-10-16
[PATCH v3 15/20] KVM: ARM64: Add reset and access handlers for PMSWINC register · Shannon Zhao <hidden> · 2015-10-21
[PATCH v3 16/20] KVM: ARM64: Add access handlers for PMEVCNTRn and PMEVTYPERn register · Shannon Zhao <hidden> · 2015-09-24
[PATCH v3 17/20] KVM: ARM64: Add PMU overflow interrupt routing · Shannon Zhao <hidden> · 2015-09-24
[PATCH v3 17/20] KVM: ARM64: Add PMU overflow interrupt routing · Marc Zyngier <hidden> · 2015-10-07
[PATCH v3 18/20] KVM: ARM64: Reset PMU state when resetting vcpu · Shannon Zhao <hidden> · 2015-09-24
[PATCH v3 18/20] KVM: ARM64: Reset PMU state when resetting vcpu · Wei Huang <hidden> · 2015-10-16
[PATCH v3 19/20] KVM: ARM64: Free perf event of PMU when destroying vcpu · Shannon Zhao <hidden> · 2015-09-24
[PATCH v3 20/20] KVM: ARM64: Add a new kvm ARM PMU device · Shannon Zhao <hidden> · 2015-09-24
[PATCH v3 00/20] KVM: ARM64: Add guest PMU support · Wei Huang <hidden> · 2015-10-16
[PATCH v3 00/20] KVM: ARM64: Add guest PMU support · Christopher Covington <hidden> · 2015-10-16
[PATCH v3 00/20] KVM: ARM64: Add guest PMU support · Shannon Zhao <hidden> · 2015-10-21
[PATCH v3 00/20] KVM: ARM64: Add guest PMU support · Christoffer Dall <hidden> · 2015-10-26
[PATCH v3 00/20] KVM: ARM64: Add guest PMU support · Shannon Zhao <hidden> · 2015-10-27

DORMANTno replies

Revisions (9)

2015-07-06 v1 [diff vs current]
2015-09-14 v2 [diff vs current]
2015-09-14 v2 [diff vs current]
2015-09-17 v2 [diff vs current]
2015-09-24 v3 [diff vs current]
2015-10-27 v3 current
2015-12-07 v5 [diff vs current]
2016-01-15 v9 [diff vs current]
2016-01-16 v9 [diff vs current]

From: Shannon Zhao <hidden>
Date: 2015-10-27 01:15:09
Also in: kvm, kvmarm


On 2015/10/26 19:33, Christoffer Dall wrote:

On Thu, Sep 24, 2015 at 03:31:05PM -0700, Shannon Zhao wrote:

quoted

This patchset adds guest PMU support for KVM on ARM64. It takes
trap-and-emulate approach. When guest wants to monitor one event, it
will be trapped by KVM and KVM will call perf_event API to create a perf
event and call relevant perf_event APIs to get the count value of event.

Use perf to test this patchset in guest. When using "perf list", it
shows the list of the hardware events and hardware cache events perf
supports. Then use "perf stat -e EVENT" to monitor some event. For
example, use "perf stat -e cycles" to count cpu cycles and
"perf stat -e cache-misses" to count cache misses.

Below are the outputs of "perf stat -r 5 sleep 5" when running in host
and guest.

Host:
 Performance counter stats for 'sleep 5' (5 runs):

          0.551428      task-clock (msec)         #    0.000 CPUs utilized            ( +-  0.91% )
                 1      context-switches          #    0.002 M/sec
                 0      cpu-migrations            #    0.000 K/sec
                48      page-faults               #    0.088 M/sec                    ( +-  1.05% )
           1150265      cycles                    #    2.086 GHz                      ( +-  0.92% )
   <not supported>      stalled-cycles-frontend
   <not supported>      stalled-cycles-backend
            526398      instructions              #    0.46  insns per cycle          ( +-  0.89% )
   <not supported>      branches
              9485      branch-misses             #   17.201 M/sec                    ( +-  2.35% )

       5.000831616 seconds time elapsed                                          ( +-  0.00% )

Guest:
 Performance counter stats for 'sleep 5' (5 runs):

          0.730868      task-clock (msec)         #    0.000 CPUs utilized            ( +-  1.13% )
                 1      context-switches          #    0.001 M/sec
                 0      cpu-migrations            #    0.000 K/sec
                48      page-faults               #    0.065 M/sec                    ( +-  0.42% )
           1642982      cycles                    #    2.248 GHz                      ( +-  1.04% )
   <not supported>      stalled-cycles-frontend
   <not supported>      stalled-cycles-backend
            637964      instructions              #    0.39  insns per cycle          ( +-  0.65% )
   <not supported>      branches
             10377      branch-misses             #   14.198 M/sec                    ( +-  1.09% )

       5.001289068 seconds time elapsed                                          ( +-  0.00% )

This looks pretty cool!

I'll review your next patch set version in more detail.

Have you tried runnig a no-op cycle counter read test in the guest and
in the host?

Basically something like:

static void nop(void *junk)
{
}

static void test_nop(void)
{
	unsigned long before,after;
	before = read_cycles();
	isb();
	nop(NULL);
	isb();
	after = read_cycles();
}

I would be very curious to see if we get a ~6000 cycles overhead in the
guest compared to bare-metal, which I expect.

Ok, I'll try this while I'm doing more tests on v4.

If we do, we should consider a hot-path in the the EL2 assembly code to
read the cycle counter to reduce the overhead to something more precise.

-- 
Shannon

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help