Thread (49 messages) 49 messages, 8 authors, 2012-01-25

Re: [PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests

From: Konrad Rzeszutek Wilk <hidden>
Date: 2012-01-25 16:35:52
Also in: kvm, xen-devel

On Wed, Jan 25, 2012 at 02:25:12PM +0530, Raghavendra K T wrote:
On 01/18/2012 12:06 AM, Raghavendra K T wrote:
quoted
On 01/17/2012 11:09 PM, Alexander Graf wrote:
[...]
quoted
quoted
quoted
quoted
quoted
A. pre-3.2.0 with CONFIG_PARAVIRT_SPINLOCKS = n
B. pre-3.2.0 + Jeremy's above patches with
CONFIG_PARAVIRT_SPINLOCKS = n
C. pre-3.2.0 + Jeremy's above patches with
CONFIG_PARAVIRT_SPINLOCKS = y
D. pre-3.2.0 + Jeremy's above patches + V5 patches with
CONFIG_PARAVIRT_SPINLOCKS = n
E. pre-3.2.0 + Jeremy's above patches + V5 patches with
CONFIG_PARAVIRT_SPINLOCKS = y
[...]
quoted
quoted
Maybe it'd be a good idea to create a small in-kernel microbenchmark
with a couple threads that take spinlocks, then do work for a
specified number of cycles, then release them again and start anew. At
the end of it, we can check how long the whole thing took for n runs.
That would enable us to measure the worst case scenario.
It was a quick test. two iteration of kernbench (=6runs) and had ensured
cache is cleared.

echo "1" > /proc/sys/vm/drop_caches
ccache -C. Yes may be I can run test as you mentioned..
Sorry for late reply. Was trying to do more performance analysis.
Measured the worst case scenario with a spinlock stress driver
[ attached below ]. I think S1 (below) is what you were
looking for:

2 types of scenarios:
S1.
lock()
increment counter.
unlock()

S2:
do_somework()
lock()
do_conditional_work() /* this is to give variable spinlock hold time */
unlock()

Setup:
Machine : IBM xSeries with Intel(R) Xeon(R) x5570 2.93GHz CPU with 8
core , 64GB RAM, 16 online cpus.
The below results are taken across total 18 Runs of
insmod spinlock_thread.ko nr_spinlock_threads=4 loop_count=4000000

Results:
scenario S1: plain counter
==========================
    total Mega cycles taken for completion (std)
A.  12343.833333      (1254.664021)
B.  12817.111111      (917.791606)
C.  13426.555556      (844.882978)

%improvement w.r.t BASE     -8.77

scenario S2: counter with variable work inside lock + do_work_outside_lock
=========================================================================
A.   25077.888889      (1349.471703)
B.   24906.777778      (1447.853874)
C.   21287.000000      (2731.643644)

%improvement w.r.t BASE      15.12

So it seems we have worst case overhead of around 8%. But we see
improvement of at-least 15% once when little more time is spent in
critical section.
Is this with collecting the histogram information about spinlocks? We found
that if you enable that for production runs it makes them quite slower.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help