Thread (30 messages) 30 messages, 4 authors, 2021-03-22

Re: [PATCH 0/3 rfc] Fix nvme-tcp and nvme-rdma controller reset hangs

From: Sagi Grimberg <sagi@grimberg.me>
Date: 2021-03-16 05:05:20

Does the problem exist on the latest version?
This was seen on 5.4 stable, not upstream but nothing prevents
this from happening in upstream code.
We also found Similar deadlocks in the older version.
However, with the latest code, it do not block grabbing the nshead srcu
when ctrl is freezed.
related patches:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/block/blk-core.c?id=fe2008640ae36e3920cf41507a84fb5d3227435a 

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5a6c35f9af416114588298aa7a90b15bbed15a41 

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/block/blk-core.c?id=ed00aabd5eb9fb44d6aff1173234a2e911b9fead 

I am not sure they are the same problem.
Its not the same problem.

When we teardown the io queues, we freeze the namespaces request queues.
This means that concurrent mpath submit_bio calls can now block with
the srcu lock taken.

When another path calls nvme_mpath_set_live, it needs to wait for
the srcu to sync before kicking the requeue work (to make sure
the updated current_path is visible).

And this is where the hang is, the only thing that will free it
is if the offending controller reconnects (and unfreeze the queue)
or it will disconnect (automatically or manually). Both can take
a very long time or even forever in some cases.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help