Re: [PATCH mlx5-next] RDMA/mlx5: Don't use cached IRQ affinity mask

[PATCH mlx5-next] RDMA/mlx5: Don't use cached IRQ affinity mask · Leon Romanovsky <leon@kernel.org> · 2018-07-16
Re: [PATCH mlx5-next] RDMA/mlx5: Don't use cached IRQ affinity mask · Sagi Grimberg <sagi@grimberg.me> · 2018-07-16
Re: [PATCH mlx5-next] RDMA/mlx5: Don't use cached IRQ affinity mask · Leon Romanovsky <leon@kernel.org> · 2018-07-16
Re: [PATCH mlx5-next] RDMA/mlx5: Don't use cached IRQ affinity mask · Max Gurtovoy <hidden> · 2018-07-16
Re: [PATCH mlx5-next] RDMA/mlx5: Don't use cached IRQ affinity mask · Sagi Grimberg <sagi@grimberg.me> · 2018-07-16
Re: [PATCH mlx5-next] RDMA/mlx5: Don't use cached IRQ affinity mask · Max Gurtovoy <hidden> · 2018-07-16
Re: [PATCH mlx5-next] RDMA/mlx5: Don't use cached IRQ affinity mask · Steve Wise <hidden> · 2018-07-16
Re: [PATCH mlx5-next] RDMA/mlx5: Don't use cached IRQ affinity mask · Max Gurtovoy <hidden> · 2018-07-17
Re: [PATCH mlx5-next] RDMA/mlx5: Don't use cached IRQ affinity mask · Leon Romanovsky <leon@kernel.org> · 2018-07-17
Re: [PATCH mlx5-next] RDMA/mlx5: Don't use cached IRQ affinity mask · Max Gurtovoy <hidden> · 2018-07-17
RE: [PATCH mlx5-next] RDMA/mlx5: Don't use cached IRQ affinity mask · Steve Wise <hidden> · 2018-07-17
Re: [PATCH mlx5-next] RDMA/mlx5: Don't use cached IRQ affinity mask · Sagi Grimberg <sagi@grimberg.me> · 2018-07-18
Re: [PATCH mlx5-next] RDMA/mlx5: Don't use cached IRQ affinity mask · Max Gurtovoy <hidden> · 2018-07-18
RE: [PATCH mlx5-next] RDMA/mlx5: Don't use cached IRQ affinity mask · Steve Wise <hidden> · 2018-07-18
RE: [PATCH mlx5-next] RDMA/mlx5: Don't use cached IRQ affinity mask · Steve Wise <hidden> · 2018-07-18
Re: [PATCH mlx5-next] RDMA/mlx5: Don't use cached IRQ affinity mask · Max Gurtovoy <hidden> · 2018-07-19
Re: [PATCH mlx5-next] RDMA/mlx5: Don't use cached IRQ affinity mask · Steve Wise <hidden> · 2018-07-19
Re: [PATCH mlx5-next] RDMA/mlx5: Don't use cached IRQ affinity mask · Max Gurtovoy <hidden> · 2018-07-20
Re: [PATCH mlx5-next] RDMA/mlx5: Don't use cached IRQ affinity mask · Jason Gunthorpe <hidden> · 2018-07-23
Re: [PATCH mlx5-next] RDMA/mlx5: Don't use cached IRQ affinity mask · Max Gurtovoy <hidden> · 2018-07-23
Re: [PATCH mlx5-next] RDMA/mlx5: Don't use cached IRQ affinity mask · Steve Wise <hidden> · 2018-07-30
Re: [PATCH mlx5-next] RDMA/mlx5: Don't use cached IRQ affinity mask · Max Gurtovoy <hidden> · 2018-07-31
Re: [PATCH mlx5-next] RDMA/mlx5: Don't use cached IRQ affinity mask · Sagi Grimberg <sagi@grimberg.me> · 2018-08-01
Re: [PATCH mlx5-next] RDMA/mlx5: Don't use cached IRQ affinity mask · Max Gurtovoy <hidden> · 2018-08-01
Re: [PATCH mlx5-next] RDMA/mlx5: Don't use cached IRQ affinity mask · Steve Wise <hidden> · 2018-08-06
Re: [PATCH mlx5-next] RDMA/mlx5: Don't use cached IRQ affinity mask · Leon Romanovsky <leon@kernel.org> · 2018-08-15
Re: [PATCH mlx5-next] RDMA/mlx5: Don't use cached IRQ affinity mask · Sagi Grimberg <sagi@grimberg.me> · 2018-08-16
Re: [PATCH mlx5-next] RDMA/mlx5: Don't use cached IRQ affinity mask · Steve Wise <hidden> · 2018-08-16
RE: [PATCH mlx5-next] RDMA/mlx5: Don't use cached IRQ affinity mask · Steve Wise <hidden> · 2018-08-17
Re: [PATCH mlx5-next] RDMA/mlx5: Don't use cached IRQ affinity mask · Sagi Grimberg <sagi@grimberg.me> · 2018-08-17
Re: [PATCH mlx5-next] RDMA/mlx5: Don't use cached IRQ affinity mask · Jason Gunthorpe <hidden> · 2018-08-17
Re: [PATCH mlx5-next] RDMA/mlx5: Don't use cached IRQ affinity mask · Sagi Grimberg <sagi@grimberg.me> · 2018-08-17
RE: [PATCH mlx5-next] RDMA/mlx5: Don't use cached IRQ affinity mask · Steve Wise <hidden> · 2018-08-18
Re: [PATCH mlx5-next] RDMA/mlx5: Don't use cached IRQ affinity mask · Steve Wise <hidden> · 2018-07-24
Re: [PATCH mlx5-next] RDMA/mlx5: Don't use cached IRQ affinity mask · Steve Wise <hidden> · 2018-07-24

From: Max Gurtovoy <hidden>
Date: 2018-07-17 09:18:25
Also in: linux-rdma


On 7/16/2018 8:08 PM, Steve Wise wrote:

Hey Max:

Hey,

On 7/16/2018 11:46 AM, Max Gurtovoy wrote:

quoted


On 7/16/2018 5:59 PM, Sagi Grimberg wrote:

quoted

Hi,
I've tested this patch and seems problematic at this moment.

Problematic how? what are you seeing?

Connection failures and same error Steve saw:

[Mon Jul 16 16:19:11 2018] nvme nvme0: Connect command failed, error
wo/DNR bit: -16402
[Mon Jul 16 16:19:11 2018] nvme nvme0: failed to connect queue: 2 ret=-18

quoted

maybe this is because of the bug that Steve mentioned in the NVMe
mailing list. Sagi mentioned that we should fix it in the NVMe/RDMA
initiator and I'll run his suggestion as well.

Is your device irq affinity linear?

When it's linear and the balancer is stopped the patch works.

quoted

BTW, when I run the blk_mq_map_queues it works for every irq affinity.

But its probably not aligned to the device vector affinity.

but I guess it's better in some cases.

I've checked the situation before Leon's patch and set all the vetcors
to CPU 0. In this case (I think that this was the initial report by
Steve), we use the affinity_hint (Israel's and Saeed's patches were we
use dev->priv.irq_info[vector].mask) and it worked fine.

Steve,
Can you share your configuration (kernel, HCA, affinity map, connect
command, lscpu) ?
I want to repro it in my lab.

- linux-4.18-rc1 + the nvme/nvmet inline_data_size patches + patches to
enable ib_get_vector_affinity() in cxgb4 + sagi's patch + leon's mlx5
patch so I can change the affinity via procfs.

ohh, now I understand that you where complaining regarding the affinity 
change reflection to mlx5_ib_get_vector_affinity and not regarding the 
failures on connecting while the affinity overlaps (that is working good 
before Leon's patch).
So this is a known issue since we used a static hint that never changes
from dev->priv.irq_info[vector].mask.

IMO we must fulfil the user wish to connect to N queues and not reduce 
it because of affinity overlaps. So in order to push Leon's patch we 
must also fix the blk_mq_rdma_map_queues to do a best effort mapping 
according the affinity and map the rest in naive way (in that way we 
will *always* map all the queues).

-Max.

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help