Thread (52 messages) 52 messages, 7 authors, 2018-01-12

Re: 4.14: WARNING: CPU: 4 PID: 2895 at block/blk-mq.c:1144 with virtio-blk

From: Christian Borntraeger <hidden>
Date: 2017-11-20 20:49:15
Also in: lkml


On 11/20/2017 08:42 PM, Jens Axboe wrote:
On 11/20/2017 12:29 PM, Christian Borntraeger wrote:
quoted

On 11/20/2017 08:20 PM, Bart Van Assche wrote:
quoted
On Fri, 2017-11-17 at 15:42 +0100, Christian Borntraeger wrote:
quoted
This is 

b7a71e66d (Jens Axboe                2017-08-01 09:28:24 -0600 1141)     * are mapped to it.
b7a71e66d (Jens Axboe                2017-08-01 09:28:24 -0600 1142)     */
6a83e74d2 (Bart Van Assche           2016-11-02 10:09:51 -0600 1143)    WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) &&
6a83e74d2 (Bart Van Assche           2016-11-02 10:09:51 -0600 1144)            cpu_online(hctx->next_cpu));
6a83e74d2 (Bart Van Assche           2016-11-02 10:09:51 -0600 1145) 
b7a71e66d (Jens Axboe                2017-08-01 09:28:24 -0600 1146)    /*
Did you really try to figure out when the code that reported the warning
was introduced? I think that warning was introduced through the following
commit:
This was more a cut'n'paste to show which warning triggered since line numbers are somewhat volatile.
quoted
commit fd1270d5df6a005e1248e87042159a799cc4b2c9
Date:   Wed Apr 16 09:23:48 2014 -0600

    blk-mq: don't use preempt_count() to check for right CPU
     
    UP or CONFIG_PREEMPT_NONE will return 0, and what we really
    want to check is whether or not we are on the right CPU.
    So don't make PREEMPT part of this, just test the CPU in
    the mask directly.

Anyway, I think that warning is appropriate and useful. So the next step
is to figure out what work item was involved and why that work item got
executed on the wrong CPU.
It seems to be related to virtio-blk (is triggered by fio on such disks). Your comment basically
says: "no this is not a known issue" then :-)
I will try to take a dump to find out the work item
blk-mq does not attempt to freeze/sync existing work if a CPU goes away,
and we reconfigure the mappings. So I don't think the above is unexpected,
if you are doing CPU hot unplug while running a fio job.
I did a cpu hot plug (adding a CPU) and I started fio AFTER that.

 
While it's a bit annoying that we trigger the WARN_ON() for a condition
that can happen, we're basically interested in it if it triggers for
normal operations.
I think we should never trigger a WARN_ON on conditions that can happen. I know some
folks enabling panic_on_warn to detect/avoid data integrity issues. FWIW, this also seems
to happen wit 4.13 and 4.12
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help