Thread (11 messages) 11 messages, 4 authors, 2022-09-08

Re: [PATCH] sbitmap: fix possible io hung due to lost wakeup

From: Jan Kara <jack@suse.cz>
Date: 2022-09-07 16:42:27
Also in: lkml

On Wed 07-09-22 08:13:40, Keith Busch wrote:
On Wed, Sep 07, 2022 at 12:23:18PM +0200, Jan Kara wrote:
quoted
On Tue 06-09-22 15:27:51, Keith Busch wrote:
quoted
On Wed, Aug 03, 2022 at 08:15:04PM +0800, Yu Kuai wrote:
quoted
 	wait_cnt = atomic_dec_return(&ws->wait_cnt);
-	if (wait_cnt <= 0) {
-		int ret;
+	/*
+	 * For concurrent callers of this, callers should call this function
+	 * again to wakeup a new batch on a different 'ws'.
+	 */
+	if (wait_cnt < 0 || !waitqueue_active(&ws->wait))
+		return true;
If wait_cnt is '0', but the waitqueue_active happens to be false due to racing
with add_wait_queue(), this returns true so the caller will retry.
Well, note that sbq_wake_ptr() called to obtain 'ws' did waitqueue_active()
check. So !waitqueue_active() should really happen only if waiter was woken
up by someone else or so. Not that it would matter much but I wanted to
point it out.
quoted
The next atomic_dec will set the current waitstate wait_cnt < 0, which
also forces an early return true. When does the wake up happen, or
wait_cnt and wait_index get updated in that case?
I guess your concern could be rephrased as: Who's going to ever set
ws->wait_cnt to value > 0 if we ever exit with wait_cnt == 0 due to
!waitqueue_active() condition?

And that is a good question and I think that's a bug in this patch. I think
we need something like:

	...
	/*
	 * For concurrent callers of this, callers should call this function
	 * again to wakeup a new batch on a different 'ws'.
	 */
	if (wait_cnt < 0)
		return true;
	/*
	 * If we decremented queue without waiters, retry to avoid lost
	 * wakeups.
	 */
	if (wait_cnt > 0)
		return !waitqueue_active(&ws->wait);
I'm not sure about this part. We've already decremented, so the freed bit is
accounted for against the batch. Returning true here may double-count the freed
bit, right?
Yes, we may wake up waiters unnecessarily frequently. But that's a
performance issue at worst and only if it happens frequently. So I don't
think it matters in practice (famous last words ;).

								Honza
-- 
Jan Kara [off-list ref]
SUSE Labs, CR
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help