Thread (20 messages) 20 messages, 4 authors, 2016-09-01

Re: Time to make dynamically allocated devt the default for scsi disks?

From: Dan Williams <hidden>
Date: 2016-08-13 16:29:09
Also in: linux-scsi

On Sat, Aug 13, 2016 at 8:23 AM, James Bottomley
[off-list ref] wrote:
On Fri, 2016-08-12 at 21:57 -0700, Dan Williams wrote:
quoted
On Fri, Aug 12, 2016 at 5:29 PM, Dan Williams <
dan.j.williams@intel.com> wrote:
quoted
On Fri, Aug 12, 2016 at 5:17 PM, James Bottomley
[off-list ref] wrote:
quoted
On Fri, 2016-08-12 at 14:29 -0700, Dan Williams wrote:
quoted
Before spending effort trying to flush the destruction of old
bdi
instances before new ones are registered, is it rather time to
complete the conversion of sd to only use dynamically allocated
devt?
Do we have to go that far?  Surely your fix is extensible: the
only
reason it doesn't work for us is that the gendisk holds the
parent
without a reference, so we can free the SCSI device before its
child
gendisk (good job no-one actually uses gendisk->parent after
we've
released it ...).  If we fix that it would mean SCSI can't
release the
sdev until after the queue is dead and the bdi namespace
released, so
isn't something like this the easy fix?

James

---
diff --git a/block/genhd.c b/block/genhd.c
index fcd6d4f..54ae4ae 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -514,7 +514,7 @@ static void register_disk(struct device
*parent, struct gendisk *disk)
        struct hd_struct *part;
        int err;

-       ddev->parent = parent;
+       ddev->parent = get_device(parent);

        dev_set_name(ddev, "%s", disk->disk_name);
@@ -1144,6 +1144,7 @@ static void disk_release(struct device
*dev)
        hd_free_part(&disk->part0);
        if (disk->queue)
                blk_put_queue(disk->queue);
+       put_device(dev->parent);
        kfree(disk);
 }
 struct class block_class = {
Looks ok at first glance to me.

We do hold a reference on the parent device, but it gets dropped at
device_unregister() time and this moves it out to the final put.
We do?  Where?
Yes, register_disk() does "ddev->parent = parent" and then
"device_add(ddev)".  device_add() takes the parent reference.
quoted
quoted
However, this does leave static devt block-device-drivers that
register a disk without a parent device susceptible to the race...
I think those exist given all the drivers still using add_disk()
after commit 52c44d93c26f "block: remove ->driverfs_dev".
It does?  The race is the fact that the parent can be removed before
the child meaning if the parent name is re-registered before the child
dies we get a duplicate name in bdi space.
No, the race is that the *name* of the parent isn't released until the
child is both unregistered and put.  The device core is already
ensuring that the parent is not released until all descendants have
been removed.
quoted
So I tried the attached and it makes the libnvdimm unit tests start
crashing.
Well, the attached is clearly buggy, isn't it?  You're trying to do a
get on the parent before the parent is actually set.
Ah, yes, thank you.  Fixed up v2 attached that passes my tests.
Why don't you
just try the incremental patch I sent instead of trying to rework it?
I reworked it because it is the bdi that holds this extra dependency
on the disk's parent, not the disk itself.
quoted
  A couple crash logs attached.  Not yet sure what assumption
is getting violated, but how about that conversion of scsi to use
dynamic devt? ;-)
It's completely orthogonal.  The problem is in hierarchy lifetimes:
switching from static to dynamic allocation won't change that at all.
 You don't see this problem in nvme because the parent control device's
lifetime belongs to the controller not the disk.  In SCSI the parent is
our representation of the SCSI device whose lifetime is governed at the
SCSI level and effectively represents the disk.
No, it's only the name.  We could achieve the same by teaching the
block core to manage the "sd_index_ida" instead of the sd driver
itself, but the v2-patch attached works and does not introduce that
layering violation.

Attachments

  • patch-v2 [application/octet-stream] 4786 bytes
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help