Thread (17 messages) 17 messages, 4 authors, 2017-11-10

Re: [PATCH 4/7] nvme: implement multipath access to nvme subsystems

From: Mike Snitzer <hidden>
Date: 2017-11-09 21:22:17
Also in: linux-nvme

On Thu, Nov 09 2017 at 12:44pm -0500,
Christoph Hellwig [off-list ref] wrote:
This patch adds native multipath support to the nvme driver.  For each
namespace we create only single block device node, which can be used
to access that namespace through any of the controllers that refer to it.
The gendisk for each controllers path to the name space still exists
inside the kernel, but is hidden from userspace.  The character device
nodes are still available on a per-controller basis.  A new link from
the sysfs directory for the subsystem allows to find all controllers
for a given subsystem.

Currently we will always send I/O to the first available path, this will
be changed once the NVMe Asynchronous Namespace Access (ANA) TP is
ratified and implemented, at which point we will look at the ANA state
for each namespace.  Another possibility that was prototyped is to
use the path that is closes to the submitting NUMA code, which will be
mostly interesting for PCI, but might also be useful for RDMA or FC
transports in the future.  There is not plan to implement round robin
or I/O service time path selectors, as those are not scalable with
the performance rates provided by NVMe.

The multipath device will go away once all paths to it disappear,
any delay to keep it alive needs to be implemented at the controller
level.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Your 0th header speaks to the NVMe multipath IO path leveraging NVMe's
lack of partial completion but I think it'd be useful to have this
header (that actually gets committed) speak to it.
quoted hunk ↗ jump to hunk
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
new file mode 100644
index 000000000000..062754ebebfd
--- /dev/null
+++ b/drivers/nvme/host/multipath.c
...
+void nvme_failover_req(struct request *req)
+{
+	struct nvme_ns *ns = req->q->queuedata;
+	unsigned long flags;
+
+	spin_lock_irqsave(&ns->head->requeue_lock, flags);
+	blk_steal_bios(&ns->head->requeue_list, req);
+	spin_unlock_irqrestore(&ns->head->requeue_lock, flags);
+	blk_mq_end_request(req, 0);
+
+	nvme_reset_ctrl(ns->ctrl);
+	kblockd_schedule_work(&ns->head->requeue_work);
+}
Also, the block core patch to introduce blk_steal_bios() already went in
but should there be a QUEUE_FLAG that gets set by drivers like NVMe that
don't support partial completion?

This would make it easier for other future drivers to know whether they
can use a more optimized IO path.

Mike
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help