Thread (30 messages) 30 messages, 4 authors, 2017-07-04

Re: [dm-devel] [PATCH 1/1] block: Convert hd_struct in_flight from atomic to percpu

From: Brian King <hidden>
Date: 2017-07-01 02:18:04
Also in: dm-devel

On 06/30/2017 06:26 PM, Jens Axboe wrote:
On 06/30/2017 05:23 PM, Ming Lei wrote:
quoted
Hi Bian,

On Sat, Jul 1, 2017 at 2:33 AM, Brian King [off-list ref] wrote:
quoted
On 06/30/2017 09:08 AM, Jens Axboe wrote:
quoted
quoted
quoted
quoted
Compared with the totally percpu approach, this way might help 1:M or
N:M mapping, but won't help 1:1 map(NVMe), when hctx is mapped to
each CPU(especially there are huge hw queues on a big system), :-(
Not disagreeing with that, without having some mechanism to only
loop queues that have pending requests. That would be similar to the
ctx_map for sw to hw queues. But I don't think that would be worthwhile
doing, I like your pnode approach better. However, I'm still not fully
convinced that one per node is enough to get the scalability we need.

Would be great if Brian could re-test with your updated patch, so we
know how it works for him at least.
I'll try running with both approaches today and see how they compare.
Focus on Ming's, a variant of that is the most likely path forward,
imho. It'd be great to do a quick run on mine as well, just to establish
how it compares to mainline, though.
On my initial runs, the one from you Jens, appears to perform a bit better, although
both are a huge improvement from what I was seeing before.

I ran 4k random reads using fio to nullblk in two configurations on my 20 core
system with 4 NUMA nodes and 4-way SMT, so 80 logical CPUs. I ran both 80 threads
to a single null_blk as well as 80 threads to 80 null_block devices, so one thread
Could you share what the '80 null_block devices' is?  It means you
create 80 null_blk
devices? Or you create one null_blk and make its hw queue number as 80
via module
parameter of ''submit_queues"?
That's a valid question, was going to ask that too. But I assumed that Brian
used submit_queues to set as many queues as he has logical CPUs in the system.
quoted
I guess we should focus on multi-queue case since it is the normal way of NVMe.
quoted
per null_blk. This is what I saw on this machine:

Using the Per node atomic change from Ming Lei
1 null_blk, 80 threads
iops=9376.5K

80 null_blk, 1 thread
iops=9523.5K


Using the alternate patch from Jens using the tags
1 null_blk, 80 threads
iops=9725.8K

80 null_blk, 1 thread
iops=9569.4K
If 1 thread means single fio job, looks the number is too too high, that means
one random IO can complete in about 0.1us(100ns) on single CPU, not sure if it
is possible, :-)
It means either 1 null_blk device, 80 threads running IO to it. Or 80 null_blk
devices, each with a thread running IO to it. See above, he details that it's
80 threads on 80 devices for that case.
Right. So the two modes I'm running in are:

1. 80 null_blk devices, each with one submit_queue, with one fio job per null_blk device,
   so 80 threads total. 80 logical CPUs
2. 1 null_blk device, with 80 submit_queues, 80 fio jobs, 80 logical CPUs.

In theory, the two should result in similar numbers. 

Here are the commands and fio configurations:

Scenario #1
modprobe null_blk submit_queues=80 nr_devices=1 irqmode=0

FIO config:
[global]
buffered=0
invalidate=1
bs=4k
iodepth=64
numjobs=80
group_reporting=1
rw=randrw
rwmixread=100
rwmixwrite=0
ioengine=libaio
runtime=60
time_based

[job1]
filename=/dev/nullb0


Scenario #2
modprobe null_blk submit_queues=1 nr_devices=80 irqmode=0

FIO config
[global]
buffered=0
invalidate=1
bs=4k
iodepth=64
numjobs=1
group_reporting=1
rw=randrw
rwmixread=100
rwmixwrite=0
ioengine=libaio
runtime=60
time_based

[job1]
filename=/dev/nullb0
[job2]
filename=/dev/nullb1
[job3]
filename=/dev/nullb2
[job4]
filename=/dev/nullb3
[job5]
filename=/dev/nullb4
[job6]
filename=/dev/nullb5
[job7]
filename=/dev/nullb6
[job8]
filename=/dev/nullb7
[job9]
filename=/dev/nullb8
[job10]
filename=/dev/nullb9
[job11]
filename=/dev/nullb10
[job12]
filename=/dev/nullb11
[job13]
filename=/dev/nullb12
[job14]
filename=/dev/nullb13
[job15]
filename=/dev/nullb14
[job16]
filename=/dev/nullb15
[job17]
filename=/dev/nullb16
[job18]
filename=/dev/nullb17
[job19]
filename=/dev/nullb18
[job20]
filename=/dev/nullb19
[job21]
filename=/dev/nullb20
[job22]
filename=/dev/nullb21
[job23]
filename=/dev/nullb22
[job24]
filename=/dev/nullb23
[job25]
filename=/dev/nullb24
[job26]
filename=/dev/nullb25
[job27]
filename=/dev/nullb26
[job28]
filename=/dev/nullb27
[job29]
filename=/dev/nullb28
[job30]
filename=/dev/nullb29
[job31]
filename=/dev/nullb30
[job32]
filename=/dev/nullb31
[job33]
filename=/dev/nullb32
[job34]
filename=/dev/nullb33
[job35]
filename=/dev/nullb34
[job36]
filename=/dev/nullb35
[job37]
filename=/dev/nullb36
[job38]
filename=/dev/nullb37
[job39]
filename=/dev/nullb38
[job40]
filename=/dev/nullb39
[job41]
filename=/dev/nullb40
[job42]
filename=/dev/nullb41
[job43]
filename=/dev/nullb42
[job44]
filename=/dev/nullb43
[job45]
filename=/dev/nullb44
[job46]
filename=/dev/nullb45
[job47]
filename=/dev/nullb46
[job48]
filename=/dev/nullb47
[job49]
filename=/dev/nullb48
[job50]
filename=/dev/nullb49
[job51]
filename=/dev/nullb50
[job52]
filename=/dev/nullb51
[job53]
filename=/dev/nullb52
[job54]
filename=/dev/nullb53
[job55]
filename=/dev/nullb54
[job56]
filename=/dev/nullb55
[job57]
filename=/dev/nullb56
[job58]
filename=/dev/nullb57
[job59]
filename=/dev/nullb58
[job60]
filename=/dev/nullb59
[job61]
filename=/dev/nullb60
[job62]
filename=/dev/nullb61
[job63]
filename=/dev/nullb62
[job64]
filename=/dev/nullb63
[job65]
filename=/dev/nullb64
[job66]
filename=/dev/nullb65
[job67]
filename=/dev/nullb66
[job68]
filename=/dev/nullb67
[job69]
filename=/dev/nullb68
[job70]
filename=/dev/nullb69
[job71]
filename=/dev/nullb70
[job72]
filename=/dev/nullb71
[job73]
filename=/dev/nullb72
[job74]
filename=/dev/nullb73
[job75]
filename=/dev/nullb74
[job76]
filename=/dev/nullb75
[job77]
filename=/dev/nullb76
[job78]
filename=/dev/nullb77
[job79]
filename=/dev/nullb78
[job80]
filename=/dev/nullb79





-Brian

-- 
Brian King
Power Linux I/O
IBM Linux Technology Center
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help