Re: [PATCH 0/13 v2] block: Fix block device shutdown related races
From: Omar Sandoval <osandov@osandov.com>
Date: 2017-03-01 07:26:53
On Tue, Feb 28, 2017 at 11:25:28PM -0800, Omar Sandoval wrote:
On Wed, Feb 22, 2017 at 11:24:25AM +0100, Jan Kara wrote:quoted
On Tue 21-02-17 10:19:28, Jens Axboe wrote:quoted
On 02/21/2017 10:09 AM, Jan Kara wrote:quoted
Hello, this is a second revision of the patch set to fix several different races and issues I've found when testing device shutdown and reuse. The first three patches are fixes to problems in my previous series fixing BDI lifetime issues. Patch 4 fixes issues with reuse of BDI name with scsi devices. With it I cannot reproduce the BDI name reuse issues using Omar's stress test using scsi_debug so it can be used as a replacement of Dan's patches. Patches 5-11 fix oops that is triggered by __blkdev_put() calling inode_detach_wb() too early (the problem reported by Thiago). Patches 12 and 13 fix oops due to a bug in gendisk code where get_gendisk() can return already freed gendisk structure (again triggered by Omar's stress test). People, please have a look at patches. They are mostly simple however the interactions are rather complex so I may have missed something. Also I'm happy for any additional testing these patches can get - I've stressed them with Omar's script, tested memcg writeback, tested static (not udev managed) device inodes. Jens, I think at least patches 1-3 should go in together with my fixes you already have in your tree (or shortly after them). It is up to you whether you decide to delay my first fixes or pick these up quickly. Patch 4 is (IMHO a cleaner) replacement of Dan's patches so consider whether you want to use it instead of those patches.I have applied 1-3 to my for-linus branch, which will go in after the initial pull request has been pulled by Linus. Consider fixing up #4 so it applies, I like it.OK, attached is patch 4 rebased on top of Linus' tree from today which already has linux-block changes pulled in. I've left put_disk_devt() in blk_cleanup_queue() to maintain the logic in the original patch (now commit 0dba1314d4f8) that request_queue and gendisk each hold one devt reference. The bdi_unregister() call that is moved to del_gendisk() by this patch is now protected by the gendisk reference instead of the request_queue one so it still maintains the property that devt reference protects bdi registration-unregistration lifetime (as much as that is not needed anymore after this patch). I have also updated the comment in the code and the changelog - they were somewhat stale after changes to the whole series Tejun suggested. HonzaHey, Jan, I just tested this out when I was seeing similar crashes with sr instead of sd, and this fixed it. Tested-by: Omar Sandoval <redacted>
Just realized it wasn't clear, I'm talking about patch 4 specifically.