Re: [PATCHv6] nvme: allow to re-attach namespaces after all paths are down | linux-nvme

quoted

On 6/21/21 8:13 PM, Sagi Grimberg wrote:

On 6/9/21 8:01 AM, Hannes Reinecke wrote:
We should only remove the ns head from the list of heads per
subsystem if the reference count drops to zero. That cleans up
reference counting, and allows us to call del_gendisk() once the last
path is removed (as then the ns_head should be removed anyway).
As this introduces a (theoretical) race condition where I/O might have
been requeued before the last path went down we also should be checking
if the gendisk is still present in nvme_ns_head_submit_bio(),
and failing I/O if so.

Changes to v5:
- Synchronize between nvme_init_ns_head() and 
nvme_mpath_check_last_path()
- Check for removed gendisk in nvme_ns_head_submit_bio()
Changes to v4:
- Call del_gendisk() in nvme_mpath_check_last_path() to avoid deadlock
Changes to v3:
- Simplify if() clause to detect duplicate namespaces
Changes to v2:
- Drop memcpy() statement
Changes to v1:
- Always check NSIDs after reattach

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
  drivers/nvme/host/core.c      |  9 ++++-----
  drivers/nvme/host/multipath.c | 30 +++++++++++++++++++++++++-----
  drivers/nvme/host/nvme.h      | 11 ++---------
  3 files changed, 31 insertions(+), 19 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 177cae44b612..6d7c2958b3e2 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -566,6 +566,9 @@ static void nvme_free_ns_head(struct kref *ref)
      struct nvme_ns_head *head =
          container_of(ref, struct nvme_ns_head, ref);
+    mutex_lock(&head->subsys->lock);
+    list_del_init(&head->entry);
+    mutex_unlock(&head->subsys->lock);
      nvme_mpath_remove_disk(head);
      ida_simple_remove(&head->subsys->ns_ida, head->instance);
      cleanup_srcu_struct(&head->srcu);
@@ -3806,8 +3809,6 @@ static void nvme_alloc_ns(struct nvme_ctrl 
*ctrl, unsigned nsid,
   out_unlink_ns:
      mutex_lock(&ctrl->subsys->lock);
      list_del_rcu(&ns->siblings);
-    if (list_empty(&ns->head->list))
-        list_del_init(&ns->head->entry);
      mutex_unlock(&ctrl->subsys->lock);
      nvme_put_ns_head(ns->head);
   out_free_queue:
@@ -3828,8 +3829,6 @@ static void nvme_ns_remove(struct nvme_ns *ns)
      mutex_lock(&ns->ctrl->subsys->lock);
      list_del_rcu(&ns->siblings);
-    if (list_empty(&ns->head->list))
-        list_del_init(&ns->head->entry);
      mutex_unlock(&ns->ctrl->subsys->lock);
      synchronize_rcu(); /* guarantee not available in head->list */
@@ -3849,7 +3848,7 @@ static void nvme_ns_remove(struct nvme_ns *ns)
      list_del_init(&ns->list);
      up_write(&ns->ctrl->namespaces_rwsem);
-    nvme_mpath_check_last_path(ns);
+    nvme_mpath_check_last_path(ns->head);
      nvme_put_ns(ns);
  }
diff --git a/drivers/nvme/host/multipath.c 
b/drivers/nvme/host/multipath.c
index 23573fe3fc7d..31153f6ec582 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -266,6 +266,8 @@ inline struct nvme_ns *nvme_find_path(struct 
nvme_ns_head *head)
      int node = numa_node_id();
      struct nvme_ns *ns;
+    if (!(head->disk->flags & GENHD_FL_UP))
+        return NULL;
      ns = srcu_dereference(head->current_path[node], &head->srcu);
      if (unlikely(!ns))
          return __nvme_find_path(head, node);
@@ -281,6 +283,8 @@ static bool nvme_available_path(struct 
nvme_ns_head *head)
  {
      struct nvme_ns *ns;
+    if (!(head->disk->flags & GENHD_FL_UP))
+        return false;
nvme_available_path should have no business looking at the head gendisk,
it should just understand if a PATH (a.k.a a controller) exists.
Agreed. I was only overly cautious here; will be dropping this check.

IMO, the fact that it does should tell that we should take a step back
and think about this. We are trying to keep an zombie nshead around
just for the possibility the host will reconnect (not as part of
error recovery, but as a brand new connect). Why shouldn't we just
remove it and restore it as a brand new nshead when the host attaches
again?
This patch has now evolved quite a bit, and in fact diverged slightly 
from the description. The original intent indeed was to keep the nshead 
around until the last reference drops, such that if a controller gets 
reattached it will be able to connect the namespaces to the correct 
(existing) ns_head.
However, as it turned out this was just a band-aid, and the real fix is 
to get the reference counts between 'struct ns' and 'struct ns_head' 
correct: if the last path to a ns_head drops, we should be removing the 
ns_head by calling del_gendisk() and removing it from the list of ns_heads.

As noted by Keith the first part is done correctly in this patch (namely 
del_gendisk() is called when the last path drops), but the second bit of 
detaching it from the list of ns_heads is _not_ done correctly.
Both should be happening at the same time to avoid any race conditions.

Will be sending an updated patch.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help