Re: [PATCH] block/mq: blk map queues by core id
From: Dongli Zhang <hidden>
Date: 2019-03-23 11:14:59
Also in:
lkml
On 03/23/2019 02:34 PM, luferry wrote:
I just bought one vm , so i cannot access to hypervisor. I will try to build the environment on my desktop. I'm sure about something. The hypervisor is KVM and disk driver is virtio-blk. [root@blk-mq ~]# dmesg |grep KVM [ 0.000000] Hypervisor detected: KVM [ 0.186330] Booting paravirtualized kernel on KVM [ 0.279106] KVM setup async PF for cpu 0 [ 0.420819] KVM setup async PF for cpu 1 [ 0.421682] KVM setup async PF for cpu 2 [ 0.422113] KVM setup async PF for cpu 3 [ 0.422434] KVM setup async PF for cpu 4 [ 0.422434] KVM setup async PF for cpu 5 [ 0.423312] KVM setup async PF for cpu 6 [ 0.423394] KVM setup async PF for cpu 7 [ 0.424125] KVM setup async PF for cpu 8 [ 0.424414] KVM setup async PF for cpu 9 [ 0.424415] KVM setup async PF for cpu 10 [ 0.425329] KVM setup async PF for cpu 11 [ 0.425420] KVM setup async PF for cpu 12 [ 0.426156] KVM setup async PF for cpu 13 [ 0.426431] KVM setup async PF for cpu 14 [ 0.426431] KVM setup async PF for cpu 15 [root@blk-mq ~]# lspci |grep block 00:05.0 SCSI storage controller: Red Hat, Inc. Virtio block device 00:06.0 SCSI storage controller: Red Hat, Inc. Virtio block device [root@blk-mq ~]# lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Thread(s) per core: 2 Core(s) per socket: 8 [root@blk-mq ~]# ls /sys/block/vdb/mq/ 0 1 2 3 [root@blk-mq ~]# cat /proc/cpuinfo |egrep 'processor|core id' processor : 0 core id : 0 processor : 1 core id : 0 processor : 2 core id : 1 processor : 3 core id : 1 processor : 4 core id : 2 processor : 5 core id : 2 processor : 6 core id : 3 processor : 7 core id : 3 processor : 8 core id : 4 processor : 9 core id : 4 processor : 10 core id : 5 processor : 11 core id : 5 processor : 12 core id : 6 processor : 13 core id : 6 processor : 14 core id : 7 processor : 15 core id : 7 --before this patch-- [root@blk-mq ~]# cat /sys/block/vdb/mq/*/cpu_list 0, 4, 5, 8, 9, 12, 13 1 2, 6, 7, 10, 11, 14, 15 3 --after this patch-- [root@blk-mq ~]# cat /sys/block/vdb/mq/*/cpu_list 0, 4, 5, 12, 13 1, 6, 7, 14, 15 2, 8, 9 3, 10, 11 I add dump_stack insert blk_mq_map_queues. [ 1.378680] Call Trace: [ 1.378690] dump_stack+0x5a/0x73 [ 1.378695] blk_mq_map_queues+0x29/0xb0 [ 1.378700] blk_mq_alloc_tag_set+0x1bd/0x2d0 [ 1.378705] virtblk_probe+0x1ae/0x8e4 [virtio_blk] [ 1.378709] virtio_dev_probe+0x18a/0x230 [virtio] [ 1.378713] really_probe+0x215/0x3f0 [ 1.378716] driver_probe_device+0x115/0x130 [ 1.378718] device_driver_attach+0x50/0x60 [ 1.378720] __driver_attach+0xbd/0x140 [ 1.378722] ? device_driver_attach+0x60/0x60 [ 1.378724] bus_for_each_dev+0x67/0xc0 [ 1.378727] ? klist_add_tail+0x57/0x70
I am not able to reproduce above call stack when virtio-blk is assigned 4 queues
while my qemu cmdline is "-smp 16,sockets=1,cores=8,threads=2".
# cat /sys/block/vda/mq/0/cpu_list
0, 1, 2, 3
# cat /sys/block/vda/mq/1/cpu_list
4, 5, 6, 7
# cat /sys/block/vda/mq/2/cpu_list
8, 9, 10, 11
# cat /sys/block/vda/mq/3/cpu_list
12, 13, 14, 15
I do agree in above case we would have issue if the mapping is established by
blk_mq_map_queues().
However, I am just curious how we finally reach at blk_mq_map_queues() from
blk_mq_alloc_tag_set()?
It should be something like:
blk_mq_alloc_tag_set()
-> blk_mq_update_queue_map()
-> if (set->ops->map_queues && !is_kdump_kernel())
return set->ops->map_queues(set);
-> else
return blk_mq_map_queues(&set->map[HCTX_TYPE_DEFAULT]);
Wouldn't we always have set->ops->map_queues = virtblk_map_queues()?
Or the execution reach at:
virtblk_map_queues()
-> blk_mq_virtio_map_queues()
-> if (!vdev->config->get_vq_affinity)
return blk_mq_map_queues(qmap);
-> else
establish the mapping via get_vq_affinity
but vdev->config->get_vq_affinity == NULL?
For virtio pci, get_vq_affinity is always set. Seems virtio mmio would not set
get_vq_affinity.
I used to play with firecracker (by amazon) and it is interesting firecracker
uses mmio to setup virtio-blk.
Sorry for disturbing the review of this patch. I just would like to clarify in
which scenario we would hit this issue, e.g., when virtio-blk is based on mmio?
Dongli Zhang
At 2019-03-22 19:58:08, "Dongli Zhang" [off-list ref] wrote:quoted
On 03/22/2019 06:09 PM, luferry wrote:quoted
under virtual machine environment, cpu topology may differ from normal physical server.Would mind share the name of virtual machine monitor, the command line if available, and which device to reproduce. For instance, I am not able to reproduce with qemu nvme or virtio-blk as I assume they use pci or virtio specific mapper to establish the mapping. E.g., with qemu and nvme: -smp 8,sockets=1,cores=4,threads=2 Indeed I use three queues instead of twp as one is reserved for admin. # ls /sys/block/nvme0n1/mq/* /sys/block/nvme0n1/mq/0: cpu0 cpu1 cpu2 cpu3 cpu_list nr_reserved_tags nr_tags /sys/block/nvme0n1/mq/1: cpu4 cpu5 cpu6 cpu7 cpu_list nr_reserved_tags nr_tags Thank you very much! Dongli Zhangquoted
for example (machine with 4 cores, 2 threads per core): normal physical server: core-id thread-0-id thread-1-id 0 0 4 1 1 5 2 2 6 3 3 7 virtual machine: core-id thread-0-id thread-1-id 0 0 1 1 2 3 2 4 5 3 6 7 When attach disk with two queues, all the even numbered cpus will be mapped to queue 0. Under virtual machine, all the cpus is followed by its sibling cpu.Before this patch, all the odd numbered cpus will also be mapped to queue 0, can cause serious imbalance.this will lead to performance impact on system IO So suggest to allocate cpu map by core id, this can be more currency Signed-off-by: luferry <redacted> --- block/blk-mq-cpumap.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c index 03a534820271..4125e8e77679 100644 --- a/block/blk-mq-cpumap.c +++ b/block/blk-mq-cpumap.c@@ -35,7 +35,7 @@ int blk_mq_map_queues(struct blk_mq_queue_map *qmap) { unsigned int *map = qmap->mq_map; unsigned int nr_queues = qmap->nr_queues; - unsigned int cpu, first_sibling; + unsigned int cpu, first_sibling, core = 0; for_each_possible_cpu(cpu) { /*@@ -48,9 +48,10 @@ int blk_mq_map_queues(struct blk_mq_queue_map *qmap) map[cpu] = cpu_to_queue_index(qmap, nr_queues, cpu); } else { first_sibling = get_first_sibling(cpu); - if (first_sibling == cpu) - map[cpu] = cpu_to_queue_index(qmap, nr_queues, cpu); - else + if (first_sibling == cpu) { + map[cpu] = cpu_to_queue_index(qmap, nr_queues, core); + core++; + } else map[cpu] = map[first_sibling]; } }