Re: [PATCH v2] block: I/O error occurs during SATA disk stress test
From: Jens Axboe <axboe@kernel.dk>
Date: 2022-08-26 13:36:17
On 8/25/22 9:25 PM, gumi@linux.alibaba.com wrote:
On 8/25/22 00:09, Gu Mi wrote:quoted
The problem occurs in two async processes, One is when a new IO calls the blk_mq_start_request() interface to start sending,The other is that the block layer timer process calls the blk_mq_req_expired interface to check whether there is an IO timeout. When an instruction out of sequence occurs between blk_add_timer and WRITE_ONCE(rq->state,MQ_RQ_IN_FLIGHT) in the interface blk_mq_start_request,at this time, the block timer is checking the new IO timeout, Since the req status has been set to MQ_RQ_IN_FLIGHT and req->deadline is 0 at this time, the new IO will be misjudged as a timeout. Our repair plan is for the deadline to be 0, and we do not think that a timeout occurs. At the same time, because the jiffies of the 32-bit system will be reversed shortly after the system is turned on, we will add 1 jiffies to the deadline at this time. Signed-off-by: Gu Mi <redacted> --- v1->v2: time_after_eq() can handle the overflow, so remove the change on 32-bit in blk_add_timer(). block/blk-mq.c | 2 ++ 1 file changed, 2 insertions(+)diff --git a/block/blk-mq.c b/block/blk-mq.c index 4b90d2d..6defaa1100644--- a/block/blk-mq.c +++ b/block/blk-mq.c@@ -1451,6 +1451,8 @@ static bool blk_mq_req_expired(struct request *rq, unsigned long *next) return false; deadline = READ_ONCE(rq->deadline); + if (unlikely(deadline == 0)) + return false; if (time_after_eq(jiffies, deadline)) return true;rq->deadline == 0 can be a valid deadline value so the above patch doesn't look right to me.
Gu, you need to fix your quoting of emails, these are impossible to read. That aside, I think there's a misunderstanding here. v1 has some parts and v2 has others. Please post a v3 that has the hunk that guarantees that deadline always has the lowest bit set if assigned, and the !deadline check as well. -- Jens Axboe