Re: [PATCH 2/6] blk-mq: replace timeout synchronization with a RCU and... | linux-block

[PATCHSET v2] blk-mq: reimplement timeout handling · Tejun Heo <tj@kernel.org> · 2017-12-12
[PATCH 1/6] blk-mq: protect completion path with RCU · Tejun Heo <tj@kernel.org> · 2017-12-12
Re: [PATCH 1/6] blk-mq: protect completion path with RCU · jianchao.wang <hidden> · 2017-12-13
Re: [PATCH 1/6] blk-mq: protect completion path with RCU · Tejun Heo <tj@kernel.org> · 2017-12-13
Re: [PATCH 1/6] blk-mq: protect completion path with RCU · jianchao.wang <hidden> · 2017-12-14
Re: [PATCH 1/6] blk-mq: protect completion path with RCU · Bart Van Assche <hidden> · 2017-12-14
Re: [PATCH 1/6] blk-mq: protect completion path with RCU · "tj@kernel.org" <tj@kernel.org> · 2017-12-14
[PATCH 2/6] blk-mq: replace timeout synchronization with a RCU and generation based scheme · Tejun Heo <tj@kernel.org> · 2017-12-12
Re: [PATCH 2/6] blk-mq: replace timeout synchronization with a RCU and generation based scheme · Bart Van Assche <hidden> · 2017-12-12
Re: [PATCH 2/6] blk-mq: replace timeout synchronization with a RCU and generation based scheme · "tj@kernel.org" <tj@kernel.org> · 2017-12-12
Re: [PATCH 2/6] blk-mq: replace timeout synchronization with a RCU and generation based scheme · jianchao.wang <hidden> · 2017-12-13
Re: [PATCH 2/6] blk-mq: replace timeout synchronization with a RCU and generation based scheme · Tejun Heo <tj@kernel.org> · 2017-12-13
Re: [PATCH 2/6] blk-mq: replace timeout synchronization with a RCU and generation based scheme · Bart Van Assche <hidden> · 2017-12-14
Re: [PATCH 2/6] blk-mq: replace timeout synchronization with a RCU and generation based scheme · "tj@kernel.org" <tj@kernel.org> · 2017-12-14
Re: [PATCH 2/6] blk-mq: replace timeout synchronization with a RCU and generation based scheme · Bart Van Assche <hidden> · 2017-12-14
Re: [PATCH 2/6] blk-mq: replace timeout synchronization with a RCU and generation based scheme · "tj@kernel.org" <tj@kernel.org> · 2017-12-15
Re: [PATCH 2/6] blk-mq: replace timeout synchronization with a RCU and generation based scheme · Peter Zijlstra <peterz@infradead.org> · 2017-12-14
Re: [PATCH 2/6] blk-mq: replace timeout synchronization with a RCU and generation based scheme · Bart Van Assche <hidden> · 2017-12-14
Re: [PATCH 2/6] blk-mq: replace timeout synchronization with a RCU and generation based scheme · Peter Zijlstra <peterz@infradead.org> · 2017-12-14
Re: [PATCH 2/6] blk-mq: replace timeout synchronization with a RCU and generation based scheme · jianchao.wang <hidden> · 2017-12-15
Re: [PATCH 2/6] blk-mq: replace timeout synchronization with a RCU and generation based scheme · Peter Zijlstra <peterz@infradead.org> · 2017-12-15
Re: [PATCH 2/6] blk-mq: replace timeout synchronization with a RCU and generation based scheme · jianchao.wang <hidden> · 2017-12-15
Re: [PATCH 2/6] blk-mq: replace timeout synchronization with a RCU and generation based scheme · Mike Galbraith <hidden> · 2017-12-15
Re: [PATCH 2/6] blk-mq: replace timeout synchronization with a RCU and generation based scheme · "tj@kernel.org" <tj@kernel.org> · 2017-12-15
[PATCH 4/6] blk-mq: make blk_abort_request() trigger timeout path · Tejun Heo <tj@kernel.org> · 2017-12-12
Re: [PATCH 4/6] blk-mq: make blk_abort_request() trigger timeout path · Bart Van Assche <hidden> · 2017-12-14
Re: [PATCH 4/6] blk-mq: make blk_abort_request() trigger timeout path · "tj@kernel.org" <tj@kernel.org> · 2017-12-14
[PATCH 6/6] blk-mq: remove REQ_ATOM_STARTED · Tejun Heo <tj@kernel.org> · 2017-12-12
Re: [PATCH 6/6] blk-mq: remove REQ_ATOM_STARTED · Bart Van Assche <hidden> · 2017-12-12
Re: [PATCH 6/6] blk-mq: remove REQ_ATOM_STARTED · "tj@kernel.org" <tj@kernel.org> · 2017-12-12
[PATCH 5/6] blk-mq: remove REQ_ATOM_COMPLETE usages from blk-mq · Tejun Heo <tj@kernel.org> · 2017-12-12
[PATCH 3/6] blk-mq: use blk_mq_rq_state() instead of testing REQ_ATOM_COMPLETE · Tejun Heo <tj@kernel.org> · 2017-12-12
Re: [PATCHSET v2] blk-mq: reimplement timeout handling · Jens Axboe <axboe@kernel.dk> · 2017-12-12
Re: [PATCHSET v2] blk-mq: reimplement timeout handling · Tejun Heo <tj@kernel.org> · 2017-12-12
Re: [PATCHSET v2] blk-mq: reimplement timeout handling · Bart Van Assche <hidden> · 2017-12-20
Re: [PATCHSET v2] blk-mq: reimplement timeout handling · "tj@kernel.org" <tj@kernel.org> · 2017-12-21
Re: [PATCHSET v2] blk-mq: reimplement timeout handling · Bart Van Assche <hidden> · 2017-12-21

Re: [PATCH 2/6] blk-mq: replace timeout synchronization with a RCU and generation based scheme

From: jianchao.wang <hidden>
Date: 2017-12-15 02:13:59
Also in: lkml


On 12/15/2017 05:54 AM, Peter Zijlstra wrote:

On Thu, Dec 14, 2017 at 09:42:48PM +0000, Bart Van Assche wrote:

quoted

On Thu, 2017-12-14 at 21:20 +0100, Peter Zijlstra wrote:

quoted

On Thu, Dec 14, 2017 at 06:51:11PM +0000, Bart Van Assche wrote:

quoted

On Tue, 2017-12-12 at 11:01 -0800, Tejun Heo wrote:

quoted

+	write_seqcount_begin(&rq->gstate_seq);
+	blk_mq_rq_update_state(rq, MQ_RQ_IN_FLIGHT);
+	blk_add_timer(rq);
+	write_seqcount_end(&rq->gstate_seq);

My understanding is that both write_seqcount_begin() and write_seqcount_end()
trigger a write memory barrier. Is a seqcount really faster than a spinlock?

Yes lots, no atomic operations and no waiting.

The only constraint for write_seqlock is that there must not be any
concurrency.

But now that I look at this again, TJ, why can't the below happen?

	write_seqlock_begin();
	blk_mq_rq_update_state(rq, IN_FLIGHT);
	blk_add_timer(rq);
	<timer-irq>
		read_seqcount_begin()
			while (seq & 1)
				cpurelax();
		// life-lock
	</timer-irq>
	write_seqlock_end();

Hello Peter,

Some time ago the block layer was changed to handle timeouts in thread context
instead of interrupt context. See also commit 287922eb0b18 ("block: defer
timeouts to a workqueue").

That only makes it a little better:

	Task-A					Worker

	write_seqcount_begin()
	blk_mq_rw_update_state(rq, IN_FLIGHT)
	blk_add_timer(rq)
	<timer>
		schedule_work()
	</timer>
	<context-switch to worker>
						read_seqcount_begin()
							while(seq & 1)
								cpu_relax();

Hi Peter

The current seqcount read side is as below:
	do {
		start = read_seqcount_begin(&rq->gstate_seq);
		gstate = READ_ONCE(rq->gstate);
		deadline = rq->deadline;
	} while (read_seqcount_retry(&rq->gstate_seq, start));
read_seqcount_retry() doesn't check the bit 0, but whether the saved value from 
read_seqcount_begin() is equal to the current value of seqcount.
pls refer:
static inline int __read_seqcount_retry(const seqcount_t *s, unsigned start)
{
	return unlikely(s->sequence != start);
}

Thanks
Jianchao

Now normally this isn't fatal because Worker will simply spin its entire
time slice away and we'll eventually schedule our Task-A back in, which
will complete the seqcount and things will work.

But if, for some reason, our Worker was to have RT priority higher than
our Task-A we'd be up some creek without no paddles.

We don't happen to have preemption of IRQs off here? That would fix
things nicely.

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help