Thread (2 messages) 2 messages, 2 authors, 2021-08-20

Re: [RESEND PATCH v2 for-next] RDMA/hns: Solve the problem that dma_pool is used during the reset

From: Jason Gunthorpe <jgg@nvidia.com>
Date: 2021-08-20 19:29:21

On Thu, Aug 19, 2021 at 10:17:54AM +0800, Wenpeng Liang wrote:
From: Lang Cheng <redacted>

During the reset, the driver calls dma_pool_destroy() to release the
dma_pool resources. If the dma_pool_free interface is called during the
modify_qp operation, an exception will occur.

[15834.440744] Unable to handle kernel paging request at virtual address
ffffa2cfc7725678
...
[15834.660596] Call trace:
[15834.663033]  queued_spin_lock_slowpath+0x224/0x308
[15834.667802]  _raw_spin_lock_irqsave+0x78/0x88
[15834.672140]  dma_pool_free+0x34/0x118
[15834.675799]  hns_roce_free_cmd_mailbox+0x54/0x88 [hns_roce_hw_v2]
[15834.681872]  hns_roce_v2_qp_modify.isra.57+0xcc/0x120 [hns_roce_hw_v2]
[15834.688376]  hns_roce_v2_modify_qp+0x4d4/0x1ef8 [hns_roce_hw_v2]
[15834.694362]  hns_roce_modify_qp+0x214/0x5a8 [hns_roce_hw_v2]
[15834.699996]  _ib_modify_qp+0xf0/0x308
[15834.703642]  ib_modify_qp+0x38/0x48
[15834.707118]  rt_ktest_modify_qp+0x14c/0x998 [rdma_test]
...
[15837.269216] Unable to handle kernel paging request at virtual address
000197c995a1d1b4
...
[15837.480898] Call trace:
[15837.483335]  __free_pages+0x28/0x78
[15837.486807]  dma_direct_free_pages+0xa0/0xe8
[15837.491058]  dma_direct_free+0x48/0x60
[15837.494790]  dma_free_attrs+0xa4/0xe8
[15837.498449]  hns_roce_buf_free+0xb0/0x150 [hns_roce_hw_v2]
[15837.503918]  mtr_free_bufs.isra.1+0x88/0xc0 [hns_roce_hw_v2]
[15837.509558]  hns_roce_mtr_destroy+0x60/0x80 [hns_roce_hw_v2]
[15837.515198]  hns_roce_v2_cleanup_eq_table+0x1d0/0x2a0 [hns_roce_hw_v2]
[15837.521701]  hns_roce_exit+0x108/0x1e0 [hns_roce_hw_v2]
[15837.526908]  __hns_roce_hw_v2_uninit_instance.isra.75+0x70/0xb8 [hns_roce_hw_v2]
[15837.534276]  hns_roce_hw_v2_uninit_instance+0x64/0x80 [hns_roce_hw_v2]
[15837.540786]  hclge_uninit_client_instance+0xe8/0x1e8 [hclge]
[15837.546419]  hnae3_uninit_client_instance+0xc4/0x118 [hnae3]
[15837.552052]  hnae3_unregister_client+0x16c/0x1f0 [hnae3]
[15837.557346]  hns_roce_hw_v2_exit+0x34/0x50 [hns_roce_hw_v2]
[15837.562895]  __arm64_sys_delete_module+0x208/0x268
[15837.567665]  el0_svc_common.constprop.4+0x110/0x200
[15837.572520]  do_el0_svc+0x34/0x98
[15837.575821]  el0_svc+0x14/0x40
[15837.578862]  el0_sync_handler+0xb0/0x2d0
[15837.582766]  el0_sync+0x140/0x180

It is caused by two concurrent processes:
	uninit_instance->dma_pool_destroy(cmdq)
	modify_qp->dma_poll_free(cmdq)
Something else has gone wrong in your system.

modify_qp is not allowed to be running after ib_unregister_device()
returns.

I see:

 [15834.707118]  rt_ktest_modify_qp+0x14c/0x998 [rdma_test]

Which suggest to me your ULP is a test and that test is not properly
acting as an ib_client. When a client is unregistered it must close
all RDMA objects and stop all activity before the client unregister
callback returns.

Jason
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help