Thread (20 messages) 20 messages, 7 authors, 2022-09-26

Re: regression caused by block: freeze the queue earlier in del_gendisk

From: Jens Axboe <axboe@kernel.dk>
Date: 2022-09-20 14:05:49
Also in: linux-block, lkml, regressions

On 9/20/22 3:11 AM, Thorsten Leemhuis wrote:
Hi, this is your Linux kernel regression tracker.

On 13.09.22 04:36, Dusty Mabe wrote:
quoted
On 9/12/22 21:55, Ming Lei wrote:
quoted
On Mon, Sep 12, 2022 at 09:16:18AM +0200, Christoph Hellwig wrote:
quoted
On Fri, Sep 09, 2022 at 04:24:40PM +0800, Ming Lei wrote:
quoted
On Wed, Sep 07, 2022 at 09:33:24AM +0200, Christoph Hellwig wrote:
quoted
On Thu, Sep 01, 2022 at 03:06:08PM +0800, Ming Lei wrote:
quoted
It is a bit hard to associate the above commit with reported issue.
So the messages clearly are about something trying to open a device
that went away at the block layer, but somehow does not get removed
in time by udev (which seems to be a userspace bug in CoreOS).  But
even with that we really should not hang.
Xiao Ni provides one script[1] which can reproduce the issue more or less.
I've run the reproduced 10000 times on current mainline, and while
it prints one of the autoloading messages per run, I've not actually
seen any kind of hang.
I can't reproduce the hang too.
I obviously can reproduce the issue with the test in our Fedora CoreOS
test suite. It's part of a framework (i.e. it's not simple some script
you can run) but it is very reproducible so one can add some instrumentation
to the kernel and feed it through a build/test cycle to see different
results or logs.

I'm willing to share this with other people (maybe a screen share or
some written down instructions) if anyone would be interested.
This thread looked stalled, or was there any progress in the past week?
If not: Fedora apparently removed the patch in their kernels a while
ago, as quite a few users where hitting it. What is preventing us from
doing the same in mainline and 5.19.y until the issue can be resolved?
The description of a09b314005f3 ("block: freeze the queue earlier in
del_gendisk") doesn't sound like the change does something crucial that
can't wait a bit. I might be totally wrong with that, but I think it's
my duty to ask that question at this point.
Christoph and I discussed this one last week, and he has a plan to try
a flag approach. Christoph, did you get a chance to bang that out? Would
be nice to get this one wrapped up.

-- 
Jens Axboe

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help