Re: Hard lockup in blk_mq_free_request() / wbt_done() / wake_up_all()
From: Jens Axboe <axboe@kernel.dk>
Date: 2018-06-12 16:22:50
Also in:
lkml
On 6/12/18 10:19 AM, Chris Boot wrote:
On 12/06/18 17:09, Jens Axboe wrote:quoted
On 6/12/18 9:38 AM, Chris Boot wrote:quoted
Hi folks, I maintain a large (to me) system with 112 threads (4x Intel E7-4830 v4) which has a MegaRAID SAS 9361-24i controller. This system is currently running Debian's 4.16.12 kernel (from stretch-backports) with blk_mq enabled. I've run into a lockup which appears to involve blq_mq and writeback throttling. It's hard to tell if I've run into this same thing with older kernels; I'm trying to track down a deadlock but so far I've been fairly certain that involved the OOM killer, but this doesn't seem to.[snip]quoted
Hmm that's really weird, I don't see how we could be spinning on the waitqueue lock like that. I haven't seen any wbt bug reports like this before. Are things generally stable if you just turn off wbt? You can do that for sda, for instance, by doing: # echo 0 > /sys/block/sda/queue/wbt_lat_usec It'd be interesting to get this data point. Eg leave blk-mq enabled, and then just disable wbt.Hi Jens, Thanks for the speedy response. I'll see if I can get that tested soon; if the system is stable without blk_mq I can see the users wanting to keep it that way for a while. I'll let you know.
Understandable. I just get suspicious of the general state of the system, if it's locking up there. Could be a hardware issue, or a bug in some other area that's messing things up. I have wbt running on literally hundreds of thousands of boxes and haven't seen a lockup like this.
quoted
Is anything disabling wbt in the system otherwise?Not that I'm aware of, no.
OK, just wanted to rule out something related to the shutdown path racing with IO. -- Jens Axboe