Thread (30 messages) 30 messages, 5 authors, 2021-08-10

Re: [bug report] iommu_dma_unmap_sg() is very slow then running IO from remote numa node

From: Ming Lei <hidden>
Date: 2021-07-09 14:21:59
Also in: linux-iommu, linux-nvme, lkml

On Fri, Jul 09, 2021 at 11:16:14AM +0100, Russell King (Oracle) wrote:
On Fri, Jul 09, 2021 at 04:38:09PM +0800, Ming Lei wrote:
quoted
I observed that NVMe performance is very bad when running fio on one
CPU(aarch64) in remote numa node compared with the nvme pci numa node.
Have you checked the effect of running a memory-heavy process using
memory from node 1 while being executed by CPUs in node 0?
1) aarch64
[root@ampere-mtjade-04 ~]# taskset -c 0 numactl -m 0  perf bench mem memcpy -s 4GB -f default
# Running 'mem/memcpy' benchmark:
# function 'default' (Default memcpy() provided by glibc)
# Copying 4GB bytes ...

      11.511752 GB/sec
[root@ampere-mtjade-04 ~]# taskset -c 0 numactl -m 1  perf bench mem memcpy -s 4GB -f default
# Running 'mem/memcpy' benchmark:
# function 'default' (Default memcpy() provided by glibc)
# Copying 4GB bytes ...

       3.084333 GB/sec


2) x86_64[1]
[root@hp-dl380g10-01 mingl]#  taskset -c 0 numactl -m 0  perf bench mem memcpy -s 4GB -f default
# Running 'mem/memcpy' benchmark:
# function 'default' (Default memcpy() provided by glibc)
# Copying 4GB bytes ...

       4.193927 GB/sec
[root@hp-dl380g10-01 mingl]#  taskset -c 0 numactl -m 1  perf bench mem memcpy -s 4GB -f default
# Running 'mem/memcpy' benchmark:
# function 'default' (Default memcpy() provided by glibc)
# Copying 4GB bytes ...

       3.553392 GB/sec


[1] on this x86_64 machine, IOPS can reach 680K in same fio nvme test 



Thanks,
Ming


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help