Re: [PATCH 2/2] nvme: delete disk when last path is gone
From: Hannes Reinecke <hare@suse.de>
Date: 2021-02-25 08:37:24
On 2/24/21 11:40 PM, Sagi Grimberg wrote:
quoted
The multipath code currently deletes the disk only after all references to it are dropped rather than when the last path to that disk is lost. This has been reported to cause problems with some use cases like MD RAID.What is the exact problem? Can you describe what the problem you see now and what you expect to see (unrelated to patch #1)?
The problem is a difference in behaviour between multipathed and non-multipathed namespaces (ie whether 'CMIC' is set or not). If the CMIC bit is _not_ set, the disk device will be removed once the controller is gone; if the CMIC bit is set the disk device will be retained, and only removed once the last _reference_ is dropped. This is causing customer issues, as some vendors produce nearly identical PCI NVMe devices, which differ in the CMIC bit. So depending on which device the customer uses, he might be getting on or the other behaviour. And this is causing issues when said customer deploys MD RAID on thems; with one set of devices PCI hotplug works, with the other set of devices it doesn't.
quoted
This patch implements an alternative behaviour of deleting the disk when the last path is gone, ie the same behaviour as non-multipathed nvme devices.But we also don't remove the non-multipath'd nvme device until the last reference drops (e.g. if you have a mounted filesystem on top).
Au contraire. When doing PCI hotplug the controller is removed (in the non-multipathed case), and calling 'put_disk()' during nvme_free_ns(). When doing PCI hotplug in the non-multipathed case, the controller is removed, too, but put_disk() is only called on the namespace itself; the 'nshead' disk is still kept around, and put_disk() on the 'nshead' disk is only called after the last reference is dropped.
This would be the equivalent to running raid on top of dm-mpath on top of scsi devices right? And if all the mpath device nodes go away the mpath device is deleted even if it has an open reference to it?
See above. The prime motivator behind this patch is to get equivalent behaviour between multipathed and non-multipathed devices. It just so happens that MD RAID exercises this particular issue.
quoted
The new behaviour will be selected with the 'fail_if_no_path' attribute, as returning it's arguably the same functionality.But its not the same functionality.
Agreed. But as the first patch will be dropped (see my other mail) I'll be redoing the patchset anyway. Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect hare@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme