Thread (32 messages) 32 messages, 3 authors, 2021-06-21

Re: vhost: multiple worker support

From: Mike Christie <michael.christie@oracle.com>
Date: 2021-06-03 18:45:50
Also in: target-devel, virtualization

On 6/3/21 5:13 AM, Stefan Hajnoczi wrote:
On Tue, May 25, 2021 at 01:05:51PM -0500, Mike Christie wrote:
quoted
Results:
--------
When running with the null_blk driver and vhost-scsi I can get 1.2
million IOPs by just running a simple

fio --filename=/dev/sda --direct=1 --rw=randrw --bs=4k --ioengine=libaio
--iodepth=128  --numjobs=8 --time_based --group_reporting --name=iops
--runtime=60 --eta-newline=1

The VM has 8 vCPUs and sda has 8 virtqueues and we can do a total of
1024 cmds per devices. To get 1.2 million IOPs I did have to tune and
ran the virsh emulatorpin command so the vhost threads were running
on different CPUs than the VM. If the vhost threads share CPUs then I
get around 800K.

For a more real device that are also CPU hogs like iscsi, I can still
get 1 million IOPs using 1 dm-multipath device over 8 iscsi paths
(natively it gets 1.1 million IOPs).
There is no comparison against a baseline, but I guess it would be the
same 8 vCPU guest with single queue vhost-scsi?
For the iscsi device the max IOPs for the single thread case was around
380K IOPs.

Here are the results with null_blk as the backend device with a 16
vCPU guest to give you a better picture.

fio
numjobs 1        2        4        8        12       16
--------------------------------------------------------

Current upstream (single thread per vhost-scsi device).
After 8 jobs there was no perf diff.
********************************************************
VQs
1       130k     338k     390k     404k     -        -
2       146k     440k     448k     478k     -        -
4       146k     456k     448k     482k     -        -
8       154k     464k     500k     490k     -        -
12      160k     454k     486k     490k     -        -
16      162k     460k     484k     486k     -        -

thread per VQ:
After 16 jobs there was no perf diff even if I increased
the number of guest vCPUs.
*********************************************************
1	same as above
2       166k     320k     542k     664k     558k     658k
4       156k     310k     660k     986k     860k     890k
8       156k     328k     652k     988k     972k     1074k
12      162k     336k     660k     1172k    1190k    1324
16      162k     332k     664k     1398k    850k     1426k

Note:
- For numjobs > 8, I lowered iodepth so we had a total of 1024
cmds over all jobs.
- virtqueue_size/cmd_per_lun=1024 was used for all tests.
- If I modify vhost-scsi so vhost_scsi_handle_vq queues the
response immediately so we never enter the LIO/block/scsi layers
then I can get around 1.6-1.8M IOPs as the max.
- There are some device wide locks in the LIO main IO path that
we are hitting in these results. We are working on removing them.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help