Re: 5.15+, blocked tasks, folio_wait_bit_common
From: Nikolay Borisov <hidden>
Date: 2021-11-12 18:46:46
Also in:
linux-fsdevel
[CC'ing Omar as Kyber is mentioned] On 12.11.21 г. 20:06, Chris Murphy wrote:
On Fri, Nov 12, 2021 at 1:55 AM Nikolay Borisov [off-list ref] wrote:quoted
On 11.11.21 г. 22:57, Chris Murphy wrote:quoted
On Thu, Nov 11, 2021 at 3:24 PM Chris Murphy [off-list ref] wrote:quoted
Soon after logging in and launching some apps, I get a hang. Although there's lots of btrfs stuff in the call traces, I think we're stuck in writeback so everything else just piles up and it all hangs indefinitely. Happening since at least 5.16.0-0.rc0.20211109gitd2f38a3c6507.9.fc36.x86_64 and is still happening with 5.16.0-0.rc0.20211111gitdebe436e77c7.11.fc36.x86_64 Full dmesg including sysrq+w when the journal becomes unresponsive and then a bunch of block tasks > 120s roll in on their own. https://bugzilla-attachments.redhat.com/attachment.cgi?id=1841283The btrfs traces in this one doesn't look interesting, what's interesting is you have a bunch of tasks, including btrfs transaction commit which are stuck waiting to get a tag from the underlying block device - blk_mq_get_tag function. This indicates something's going on with the underlying block device.Well the hang doesn't ever happen with 5.14.x or 5.15.x kernels, only the misc-next (Fedora rc0) kernels. And also I just discovered that it's not happening (or not as quickly) with IO scheduler none. I've been using kyber and when I switch back to it, the hang happens almost immediately.
Well I see a bunch of WARN_ONs being triggered, so is it possible that this is some issue which is going to be fixed in some future RC ? Omar what steps should be taken to try and debug this from the Kyber side of things?