Re: [PATCH] nvme-rdma: fix crash for no IO queues
From: Chao Leng <hidden>
Date: 2021-03-02 09:49:29
On 2021/3/2 15:48, Hannes Reinecke wrote:
On 2/27/21 10:30 AM, Chao Leng wrote:quoted
On 2021/2/27 17:12, Hannes Reinecke wrote:quoted
On 2/24/21 6:59 AM, Chao Leng wrote:quoted
On 2021/2/24 7:21, Keith Busch wrote:quoted
On Tue, Feb 23, 2021 at 03:26:02PM +0800, Chao Leng wrote:quoted
A crash happens when set feature(NVME_FEAT_NUM_QUEUES) timeout in nvme over rdma(roce) reconnection, the reason is use the queue which is not alloced. If it is not discovery and no io queues, the connection should fail.If you're getting a timeout, we need to quit initialization. Hannes attempted making that status visible for fabrics here: http://lists.infradead.org/pipermail/linux-nvme/2021-January/022353.htmlI know the patch. It can not solve the scenario: target may be an attacker or the target behavior is incorrect. If target return 0 io queues or return other error code, the crash will still happen. We should not allow this to happen.I'm fully with you that we shouldn't crash, but at the same time a value of '0' for the number of I/O queues is considered valid. So we should fix the code to handle this scenario, and not disallowing zero I/O queues.'0' I/O queues doesn't make any sense to nvme over fabrics, it is different with nvme over pci. If there is some bug with target, we can debug it in target instead of use admin queue in host. target may be an attacker or the target behavior is incorrect. So we should avoid crash. Another option: prohibit request delivery if io queue do not created. I think failed connection with '0' I/O queues is a better choice.Might be, but that's not for me to decide. I tried that initially, but that patch got rejected as _technically_ the controller is reachable via its admin queue.
I know about your patch. That patch failed connection for all transports. It is not good for pcie transport, the controller can accept admin commands to get some diagnostics (perhaps an error log page), this is keith's thoughts.
Sagi? Christoph? Are controllers with 0 I/O queues valid or is this an error condition? Cheers, Hannes
_______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme