Re: v5.14 RXE driver broken?
From: Bart Van Assche <bvanassche@acm.org>
Date: 2021-08-25 20:58:24
Also in:
linux-block
Subsystem:
infiniband subsystem, soft-roce driver (rxe), the rest · Maintainers:
Jason Gunthorpe, Leon Romanovsky, Zhu Yanjun, Linus Torvalds
On 8/25/21 11:22 AM, Bart Van Assche wrote:
On 8/25/21 9:32 AM, Jason Gunthorpe wrote:quoted
On Wed, Aug 25, 2021 at 11:02:14AM +0800, Zhu Yanjun wrote:quoted
On Tue, Aug 24, 2021 at 11:02 AM Bart Van Assche [off-list ref] wrote:quoted
Hi Bob, If I run the following test against Linus' master branch then that test passes (commit d5ae8d7f85b7 ("Revert "media: dvb header files: move some headers to staging"")): # export use_siw=1 && modprobe brd && (cd blktests && ./check -q srp/002) srp/002 (File I/O on top of multipath concurrently with logout and login (mq)) [passed] runtime ... 48.849s The following test fails: # export use_siw= && modprobe brd && (cd blktests && ./check -q srp/002) srp/002 (File I/O on top of multipath concurrently with logout and login (mq)) [failed] runtime 48.849s ... 15.024s +++ /home/bart/software/blktests/results/nodev/srp/002.out.bad 2021-08-23 19:51:05.182958728 -0700 @@ -1,2 +1 @@ Configured SRP target driver -PassedCan this commit "RDMA/rxe: Zero out index member of struct rxe_queue" in the link https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git/commit/?h=wip/jgg-for-rc fix this problem? And the commit will be merged into linux upstream very soon.Please let me know Bart, if the rxe driver is still broken I will definitely punt all the changes for RXE to the next cycle until it can be fixed.Hi Jason, Thanks for having offered to revert the RXE changes from this merge window. Unfortunately that wouldn't be sufficient. My test results so far for test srp/002 in combination with the rdma_rxe driver are as follows: * Kernel v5.12: test passes. * Kernel v5.13: test fails. * Kernel v5.14-rc7: test fails. For the rdma_rxe tests for kernel v5.14-rc7 I found the following in the kernel log: ib_srp:add_target_store: ib_srp: max_sectors = 1024; max_pages_per_mr = 512; mr_page_size = 4096; max_sectors_per_mr = 4096; mr_per_cmd = 2 ib_srp: enp1s0_rxe: ib_alloc_mr() failed. Try to reduce max_cmd_per_lun, max_sect or ch_count There is sufficient memory available in the VM in which I ran the tests. It is not clear to me why ib_alloc_mr() fails with these parameters when using the rdma_rxe driver? As one can see in srp_alloc_fr_pool() the SRP initiator driver respects the max_pages_per_mr RDMA driver limit.
A correction: test srp/002 passes on my setup against kernel v5.13. I probably selected the wrong kernel from the GRUB boot menu before I sent my previous email. So the test failure is something that happens with v5.14-rc but not with v5.13. Applying the following patch on top Linus' master branch did not help:
diff --git a/drivers/infiniband/sw/rxe/rxe_param.h b/drivers/infiniband/sw/rxe/rxe_param.h
index 742e6ec93686..643b80e47c82 100644
--- a/drivers/infiniband/sw/rxe/rxe_param.h
+++ b/drivers/infiniband/sw/rxe/rxe_param.h@@ -88,7 +88,7 @@ enum rxe_device_param { RXE_MIN_SRQ_INDEX = 0x00020001, RXE_MAX_SRQ_INDEX = 0x00040000, - RXE_MAX_MR = 0x00001000, + RXE_MAX_MR = 0x00100000, RXE_MAX_MW = 0x00001000, RXE_MIN_MR_INDEX = 0x00000001, RXE_MAX_MR_INDEX = 0x00010000,
Bart.