Re: [PATCH] blk-mq: Do not lookup ctx with invalid index
From: Ming Lei <hidden>
Date: 2021-06-15 03:37:55
Also in:
lkml
From: Ming Lei <hidden>
Date: 2021-06-15 03:37:55
Also in:
lkml
On Mon, Jun 14, 2021 at 01:37:06PM +0200, Daniel Wagner wrote:
On Tue, Jun 08, 2021 at 08:33:39PM +0200, Daniel Wagner wrote:quoted
cpumask_first_and() returns >= nr_cpu_ids if the two provided masks do not share a common bit. Verify we get a valid value back from cpumask_first_and().So I got feedback on this issue (but not on the patch itself yet). The system starts with 16 virtual CPU cores and during the test 4 cores are removed[1] and as soon there is an error on the storage side, the reset code on the host ends up in this path and crashes. I still don't understand why the CPU removal is not updating the CPU mask correctly before we hit the reset path. I'll continue to investigate.
We don't update hctx->cpumask when CPU is added/removed, and that is assigned against cpu_possible_mask from beginning. It is one long-term issue, which can be triggered when all cpus in hctx->cpumask become offline. The thing is that only nvmf_connect_io_queue() allocates request via specified hctx. thanks, Ming