Re: [PATCHv4 0/3] scsi timeout handling updates
From: Keith Busch <hidden>
Date: 2018-11-29 17:23:31
On Thu, Nov 29, 2018 at 06:11:59PM +0100, Christoph Hellwig wrote:
quoted
diff --git a/block/blk-mq.c b/block/blk-mq.c index a82830f39933..d0ef540711c7 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c@@ -647,7 +647,7 @@ EXPORT_SYMBOL(blk_mq_complete_request); int blk_mq_request_started(struct request *rq) { - return blk_mq_rq_state(rq) != MQ_RQ_IDLE; + return blk_mq_rq_state(rq) == MQ_RQ_IN_FLIGHT; } EXPORT_SYMBOL_GPL(blk_mq_request_started);Independ of this series this change looks like the right thing to do. But this whole area is a mine field, so separate testing would be very helpful. I also wonder why we even bother with the above helper, a direct state comparism seems a lot more obvious to the reader.
I think it's just because blk_mq_rq_state() is a private interface. The enum mq_rq_state is defined under include/linux/, so it looks okay to make getting the state public too.
Last but not least the blk_mq_request_started check in nbd should probably be lifted into blk_mq_tag_to_rq while we're at it.. As for the nvme issue - it seems to me like we need to decouple the nvme loop frontend request from the target backend request. In case of a timeout/reset we'll complete struct request like all other nvme transport drivers, but we leave the backend target state around, which will be freed when it completes (or leaks when the completion is lost).
I don't think nvme's loop target should do anything to help a command complete. It shouldn't even implement a timeout for the same reason no stacking block driver implements these. If a request is stuck, the lowest level is the only driver that should have the responsibility to make it unstuck.