Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk (also 4.12 stable)
From: Christian Borntraeger <hidden>
Date: 2017-12-07 09:20:27
Also in:
linux-s390, lkml
On 12/07/2017 12:29 AM, Christoph Hellwig wrote:
On Wed, Dec 06, 2017 at 01:25:11PM +0100, Christian Borntraeger wrote: t > commit 11b2025c3326f7096ceb588c3117c7883850c068 -> badquoted
blk-mq: create a blk_mq_ctx for each possible CPU does not boot on DASD and commit 9c6ae239e01ae9a9f8657f05c55c4372e9fc8bcc -> good genirq/affinity: assign vectors to all possible CPUs does boot with DASD disks. Also adding Stefan Haberland if he has an idea why this fails on DASD and adding Martin (for the s390 irq handling code).That is interesting as it really isn't related to interrupts at all, it just ensures that possible CPUs are set in ->cpumask. I guess we'd really want: e005655c389e3d25bf3e43f71611ec12f3012de0 "blk-mq: only select online CPUs in blk_mq_hctx_next_cpu" before this commit, but it seems like the whole stack didn't work for your either. I wonder if there is some weird thing about nr_cpu_ids in s390?
The problem starts as soon as NR_CPUS is larger than the number of real CPUs. Aquestions Wouldnt your change in blk_mq_hctx_next_cpu fail if there is more than 1 non-online cpu: e.g. dont we need something like (whitespace and indent damaged)
@@ -1241,11 +1241,11 @@ static int blk_mq_hctx_next_cpu(struct blk_mq_hw_ctx *hctx) if (--hctx->next_cpu_batch <= 0) { int next_cpu; + do { next_cpu = cpumask_next(hctx->next_cpu, hctx->cpumask); - if (!cpu_online(next_cpu)) - next_cpu = cpumask_next(next_cpu, hctx->cpumask); if (next_cpu >= nr_cpu_ids) next_cpu = cpumask_first(hctx->cpumask); + } while (!cpu_online(next_cpu)); hctx->next_cpu = next_cpu; hctx->next_cpu_batch = BLK_MQ_CPU_WORK_BATCH;
it does not fix the issue, though (and it would be pretty inefficient for large NR_CPUS)