Thread (30 messages) 30 messages, 5 authors, 2021-08-10

Re: [bug report] iommu_dma_unmap_sg() is very slow then running IO from remote numa node

From: Ming Lei <hidden>
Date: 2021-07-09 14:25:21
Also in: linux-arm-kernel, linux-iommu, lkml

On Fri, Jul 09, 2021 at 11:26:53AM +0100, Robin Murphy wrote:
On 2021-07-09 09:38, Ming Lei wrote:
quoted
Hello,

I observed that NVMe performance is very bad when running fio on one
CPU(aarch64) in remote numa node compared with the nvme pci numa node.

Please see the test result[1] 327K vs. 34.9K.

Latency trace shows that one big difference is in iommu_dma_unmap_sg(),
1111 nsecs vs 25437 nsecs.
Are you able to dig down further into that? iommu_dma_unmap_sg() itself
doesn't do anything particularly special, so whatever makes a difference is
probably happening at a lower level, and I suspect there's probably an SMMU
involved. If for instance it turns out to go all the way down to
__arm_smmu_cmdq_poll_until_consumed() because polling MMIO from the wrong
node is slow, there's unlikely to be much you can do about that other than
the global "go faster" knobs (iommu.strict and iommu.passthrough) with their
associated compromises.
Follows the log of 'perf report'

1) good(run fio from cpus in the nvme's numa node)

-   34.86%     1.73%  fio       [nvme]              [k] nvme_process_cq                                                      ▒
   - 33.13% nvme_process_cq                                                                                                  ▒
      - 32.93% nvme_pci_complete_rq                                                                                          ▒
         - 24.92% nvme_unmap_data                                                                                            ▒
            - 20.08% dma_unmap_sg_attrs                                                                                      ▒
               - 19.79% iommu_dma_unmap_sg                                                                                   ▒
                  - 19.55% __iommu_dma_unmap                                                                                 ▒
                     - 16.86% arm_smmu_iotlb_sync                                                                            ▒
                        - 16.81% arm_smmu_tlb_inv_range_domain                                                               ▒
                           - 14.73% __arm_smmu_tlb_inv_range                                                                 ▒
                                14.44% arm_smmu_cmdq_issue_cmdlist                                                           ▒
                             0.89% __pi_memset                                                                               ▒
                             0.75% arm_smmu_atc_inv_domain                                                                   ▒
                     + 1.58% iommu_unmap_fast                                                                                ▒
                     + 0.71% iommu_dma_free_iova                                                                             ▒
            - 3.25% dma_unmap_page_attrs                                                                                     ▒
               - 3.21% iommu_dma_unmap_page                                                                                  ▒
                  - 3.14% __iommu_dma_unmap_swiotlb                                                                          ▒
                     - 2.86% __iommu_dma_unmap                                                                               ▒
                        - 2.48% arm_smmu_iotlb_sync                                                                          ▒
                           - 2.47% arm_smmu_tlb_inv_range_domain                                                             ▒
                              - 2.19% __arm_smmu_tlb_inv_range                                                               ▒
                                   2.16% arm_smmu_cmdq_issue_cmdlist                                                         ▒
            + 1.34% mempool_free                                                                                             ▒
         + 7.68% nvme_complete_rq                                                                                            ▒
   + 1.73% _start


2) bad(run fio from cpus not in the nvme's numa node)
-   49.25%     3.03%  fio       [nvme]              [k] nvme_process_cq                                                      ▒
   - 46.22% nvme_process_cq                                                                                                  ▒
      - 46.07% nvme_pci_complete_rq                                                                                          ▒
         - 41.02% nvme_unmap_data                                                                                            ▒
            - 34.92% dma_unmap_sg_attrs                                                                                      ▒
               - 34.75% iommu_dma_unmap_sg                                                                                   ▒
                  - 34.58% __iommu_dma_unmap                                                                                 ▒
                     - 33.04% arm_smmu_iotlb_sync                                                                            ▒
                        - 33.00% arm_smmu_tlb_inv_range_domain                                                               ▒
                           - 31.86% __arm_smmu_tlb_inv_range                                                                 ▒
                                31.71% arm_smmu_cmdq_issue_cmdlist                                                           ▒
                     + 0.90% iommu_unmap_fast                                                                                ▒
            - 5.17% dma_unmap_page_attrs                                                                                     ▒
               - 5.15% iommu_dma_unmap_page                                                                                  ▒
                  - 5.12% __iommu_dma_unmap_swiotlb                                                                          ▒
                     - 5.05% __iommu_dma_unmap                                                                               ▒
                        - 4.86% arm_smmu_iotlb_sync                                                                          ▒
                           - 4.85% arm_smmu_tlb_inv_range_domain                                                             ▒
                              - 4.70% __arm_smmu_tlb_inv_range                                                               ▒
                                   4.67% arm_smmu_cmdq_issue_cmdlist                                                         ▒
            + 0.74% mempool_free                                                                                             ▒
         + 4.83% nvme_complete_rq                                                                                            ▒
   + 3.03% _start


Thanks, 
Ming


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help