Re: Random high CPU utilization in blk-mq with the none scheduler
From: Jens Axboe <axboe@kernel.dk>
Date: 2021-12-11 02:05:14
Also in:
lkml
On 12/10/21 6:29 PM, Dexuan Cui wrote:
quoted
From: Dexuan Cui Sent: Thursday, December 9, 2021 7:30 PM Hi all, I found a random high CPU utilization issue with some database benchmark program running on a 192-CPU virtual machine (VM). Originally the issue was found with RHEL 8.4 and Ubuntu 20.04, and further tests show that the issue also reproduces with the latest upstream stable kernel v5.15.7, but *not* with v5.16-rc1. It looks like someone resolved the issue in v5.16-rc1 recently?I did git-bisect on the linux-block tree's for-5.16/block branch and this patch resolves the random high CPU utilization issue (I'm not sure how): dc5fc361d891 ("block: attempt direct issue of plug list") https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git/commit/?h=for-5.16/block&id=dc5fc361d891e089dfd9c0a975dc78041036b906 Do you think if it's easy to backport it to earlier versions like 5.10? It looks like there are a lot of prerequisite patches.
It's more likely the real fix is avoiding the repeated plug list scan,
which I guess makes sense. That is this commit:
commit d38a9c04c0d5637a828269dccb9703d42d40d42b
Author: Jens Axboe [off-list ref]
Date: Thu Oct 14 07:24:07 2021 -0600
block: only check previous entry for plug merge attempt
If that's the case, try 5.15.x again and do:
echo 2 > /sys/block/<dev>/queue/nomerges
for each drive you are using in the IO test, and see if that gets
rid of the excess CPU usage.
--
Jens Axboe