Re: [Update PATCH V3] md: don't unregister sync_thread with reconfig_mutex held
From: Christoph Hellwig <hch@infradead.org>
Date: 2022-05-31 06:12:03
Also in:
linux-block
On Thu, May 26, 2022 at 01:53:36PM +0200, Jan Kara wrote:
So I've debugged this. The crash happens on the very first bio submitted to the md0 device. The problem is that this bio gets remapped to loop0 - this happens through bio_alloc_clone() -> __bio_clone() which ends up calling bio_clone_blkg_association(). Now the resulting bio is inconsistent - it's dst_bio->bi_bdev is pointing to loop0 while dst_bio->bi_blkg is pointing to blkcg_gq associated with md0 request queue. And this breaks BFQ because when this bio is inserted to loop0 request queue, BFQ looks at bio->bi_blkg->q (it is a bit more complex than that but this is the gist of the problem), expects its data there but BFQ is not initialized for md0 request_queue. Now I think this is a bug in __bio_clone() but the inconsistency in the bio is very much what we asked bio_clone_blkg_association() to do so maybe I'm missing something and bios that are associated with one bdev but pointing to blkg of another bdev are fine and controllers are supposed to handle that (although I'm not sure how should they do that). So I'm asking here before I just go and delete bio_clone_blkg_association() from __bio_clone()...
This behavior probably goes back to my commit here:
ommit d92c370a16cbe0276954c761b874bd024a7e4fac
Author: Christoph Hellwig [off-list ref]
Date: Sat Jun 27 09:31:48 2020 +0200
block: really clone the block cgroup in bio_clone_blkg_association
and it seems everyone else was fine with that behavior so far.