Thread (37 messages) 37 messages, 6 authors, 2015-10-27
DORMANTno replies
Revisions (9)
  1. v1 [diff vs current]
  2. v2 [diff vs current]
  3. v2 [diff vs current]
  4. v2 [diff vs current]
  5. v3 [diff vs current]
  6. v3 current
  7. v5 [diff vs current]
  8. v9 [diff vs current]
  9. v9 [diff vs current]

[PATCH v3 00/20] KVM: ARM64: Add guest PMU support

From: Shannon Zhao <hidden>
Date: 2015-10-27 01:15:09
Also in: kvm, kvmarm


On 2015/10/26 19:33, Christoffer Dall wrote:
On Thu, Sep 24, 2015 at 03:31:05PM -0700, Shannon Zhao wrote:
quoted
This patchset adds guest PMU support for KVM on ARM64. It takes
trap-and-emulate approach. When guest wants to monitor one event, it
will be trapped by KVM and KVM will call perf_event API to create a perf
event and call relevant perf_event APIs to get the count value of event.

Use perf to test this patchset in guest. When using "perf list", it
shows the list of the hardware events and hardware cache events perf
supports. Then use "perf stat -e EVENT" to monitor some event. For
example, use "perf stat -e cycles" to count cpu cycles and
"perf stat -e cache-misses" to count cache misses.

Below are the outputs of "perf stat -r 5 sleep 5" when running in host
and guest.

Host:
 Performance counter stats for 'sleep 5' (5 runs):

          0.551428      task-clock (msec)         #    0.000 CPUs utilized            ( +-  0.91% )
                 1      context-switches          #    0.002 M/sec
                 0      cpu-migrations            #    0.000 K/sec
                48      page-faults               #    0.088 M/sec                    ( +-  1.05% )
           1150265      cycles                    #    2.086 GHz                      ( +-  0.92% )
   <not supported>      stalled-cycles-frontend
   <not supported>      stalled-cycles-backend
            526398      instructions              #    0.46  insns per cycle          ( +-  0.89% )
   <not supported>      branches
              9485      branch-misses             #   17.201 M/sec                    ( +-  2.35% )

       5.000831616 seconds time elapsed                                          ( +-  0.00% )

Guest:
 Performance counter stats for 'sleep 5' (5 runs):

          0.730868      task-clock (msec)         #    0.000 CPUs utilized            ( +-  1.13% )
                 1      context-switches          #    0.001 M/sec
                 0      cpu-migrations            #    0.000 K/sec
                48      page-faults               #    0.065 M/sec                    ( +-  0.42% )
           1642982      cycles                    #    2.248 GHz                      ( +-  1.04% )
   <not supported>      stalled-cycles-frontend
   <not supported>      stalled-cycles-backend
            637964      instructions              #    0.39  insns per cycle          ( +-  0.65% )
   <not supported>      branches
             10377      branch-misses             #   14.198 M/sec                    ( +-  1.09% )

       5.001289068 seconds time elapsed                                          ( +-  0.00% )
This looks pretty cool!

I'll review your next patch set version in more detail.

Have you tried runnig a no-op cycle counter read test in the guest and
in the host?

Basically something like:

static void nop(void *junk)
{
}

static void test_nop(void)
{
	unsigned long before,after;
	before = read_cycles();
	isb();
	nop(NULL);
	isb();
	after = read_cycles();
}

I would be very curious to see if we get a ~6000 cycles overhead in the
guest compared to bare-metal, which I expect.
Ok, I'll try this while I'm doing more tests on v4.
If we do, we should consider a hot-path in the the EL2 assembly code to
read the cycle counter to reduce the overhead to something more precise.
 
-- 
Shannon
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help