Re: Time to make dynamically allocated devt the default for scsi disks?
From: Dan Williams <hidden>
Date: 2016-08-13 16:29:09
Also in:
linux-scsi
On Sat, Aug 13, 2016 at 8:23 AM, James Bottomley [off-list ref] wrote:
On Fri, 2016-08-12 at 21:57 -0700, Dan Williams wrote:quoted
On Fri, Aug 12, 2016 at 5:29 PM, Dan Williams < dan.j.williams@intel.com> wrote:quoted
On Fri, Aug 12, 2016 at 5:17 PM, James Bottomley [off-list ref] wrote:quoted
On Fri, 2016-08-12 at 14:29 -0700, Dan Williams wrote:quoted
Before spending effort trying to flush the destruction of old bdi instances before new ones are registered, is it rather time to complete the conversion of sd to only use dynamically allocated devt?Do we have to go that far? Surely your fix is extensible: the only reason it doesn't work for us is that the gendisk holds the parent without a reference, so we can free the SCSI device before its child gendisk (good job no-one actually uses gendisk->parent after we've released it ...). If we fix that it would mean SCSI can't release the sdev until after the queue is dead and the bdi namespace released, so isn't something like this the easy fix? James ---diff --git a/block/genhd.c b/block/genhd.c index fcd6d4f..54ae4ae 100644 --- a/block/genhd.c +++ b/block/genhd.c@@ -514,7 +514,7 @@ static void register_disk(struct device*parent, struct gendisk *disk) struct hd_struct *part; int err; - ddev->parent = parent; + ddev->parent = get_device(parent); dev_set_name(ddev, "%s", disk->disk_name);@@ -1144,6 +1144,7 @@ static void disk_release(struct device*dev) hd_free_part(&disk->part0); if (disk->queue) blk_put_queue(disk->queue); + put_device(dev->parent); kfree(disk); } struct class block_class = {Looks ok at first glance to me. We do hold a reference on the parent device, but it gets dropped at device_unregister() time and this moves it out to the final put.We do? Where?
Yes, register_disk() does "ddev->parent = parent" and then "device_add(ddev)". device_add() takes the parent reference.
quoted
quoted
However, this does leave static devt block-device-drivers that register a disk without a parent device susceptible to the race... I think those exist given all the drivers still using add_disk() after commit 52c44d93c26f "block: remove ->driverfs_dev".It does? The race is the fact that the parent can be removed before the child meaning if the parent name is re-registered before the child dies we get a duplicate name in bdi space.
No, the race is that the *name* of the parent isn't released until the child is both unregistered and put. The device core is already ensuring that the parent is not released until all descendants have been removed.
quoted
So I tried the attached and it makes the libnvdimm unit tests start crashing.Well, the attached is clearly buggy, isn't it? You're trying to do a get on the parent before the parent is actually set.
Ah, yes, thank you. Fixed up v2 attached that passes my tests.
Why don't you just try the incremental patch I sent instead of trying to rework it?
I reworked it because it is the bdi that holds this extra dependency on the disk's parent, not the disk itself.
quoted
A couple crash logs attached. Not yet sure what assumption is getting violated, but how about that conversion of scsi to use dynamic devt? ;-)It's completely orthogonal. The problem is in hierarchy lifetimes: switching from static to dynamic allocation won't change that at all. You don't see this problem in nvme because the parent control device's lifetime belongs to the controller not the disk. In SCSI the parent is our representation of the SCSI device whose lifetime is governed at the SCSI level and effectively represents the disk.
No, it's only the name. We could achieve the same by teaching the block core to manage the "sd_index_ida" instead of the sd driver itself, but the v2-patch attached works and does not introduce that layering violation.
Attachments
- patch-v2 [application/octet-stream] 4786 bytes