Thread (15 messages) 15 messages, 4 authors, 2021-03-03

Re: [PATCH] nvme-rdma: fix crash for no IO queues

From: Chao Leng <hidden>
Date: 2021-03-02 09:49:29


On 2021/3/2 15:48, Hannes Reinecke wrote:
On 2/27/21 10:30 AM, Chao Leng wrote:
quoted

On 2021/2/27 17:12, Hannes Reinecke wrote:
quoted
On 2/24/21 6:59 AM, Chao Leng wrote:
quoted

On 2021/2/24 7:21, Keith Busch wrote:
quoted
On Tue, Feb 23, 2021 at 03:26:02PM +0800, Chao Leng wrote:
quoted
A crash happens when set feature(NVME_FEAT_NUM_QUEUES) timeout in nvme
over rdma(roce) reconnection, the reason is use the queue which is not
alloced.

If it is not discovery and no io queues, the connection should fail.
If you're getting a timeout, we need to quit initialization. Hannes
attempted making that status visible for fabrics here:

http://lists.infradead.org/pipermail/linux-nvme/2021-January/022353.html
I know the patch. It can not solve the scenario: target may be an
attacker or the target behavior is incorrect.
If target return 0 io queues or return other error code, the crash will
still happen. We should not allow this to happen.
I'm fully with you that we shouldn't crash, but at the same time a
value of '0' for the number of I/O queues is considered valid.
So we should fix the code to handle this scenario, and not disallowing
zero I/O queues.
'0' I/O queues doesn't make any sense to nvme over fabrics, it is
different with nvme over pci. If there is some bug with target, we can
debug it in target instead of use admin queue in host.
target may be an attacker or the target behavior is incorrect. So we
should avoid crash. Another option: prohibit  request delivery if
io queue do not created.
I think failed connection with '0' I/O queues is a better choice.
Might be, but that's not for me to decide.
I tried that initially, but that patch got rejected as _technically_ the
controller is reachable via its admin queue.
I know about your patch. That patch failed connection for all transports.
It is not good for pcie transport, the controller can accept admin
commands to get some diagnostics (perhaps an error log page), this is
keith's thoughts.
Sagi? Christoph?
Are controllers with 0 I/O queues valid or is this an error condition?

Cheers,

Hannes
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help