Thread (37 messages) 37 messages, 6 authors, 2017-12-21

Re: [PATCH 2/6] blk-mq: replace timeout synchronization with a RCU and generation based scheme

From: jianchao.wang <hidden>
Date: 2017-12-15 02:13:59
Also in: lkml


On 12/15/2017 05:54 AM, Peter Zijlstra wrote:
On Thu, Dec 14, 2017 at 09:42:48PM +0000, Bart Van Assche wrote:
quoted
On Thu, 2017-12-14 at 21:20 +0100, Peter Zijlstra wrote:
quoted
On Thu, Dec 14, 2017 at 06:51:11PM +0000, Bart Van Assche wrote:
quoted
On Tue, 2017-12-12 at 11:01 -0800, Tejun Heo wrote:
quoted
+	write_seqcount_begin(&rq->gstate_seq);
+	blk_mq_rq_update_state(rq, MQ_RQ_IN_FLIGHT);
+	blk_add_timer(rq);
+	write_seqcount_end(&rq->gstate_seq);
My understanding is that both write_seqcount_begin() and write_seqcount_end()
trigger a write memory barrier. Is a seqcount really faster than a spinlock?
Yes lots, no atomic operations and no waiting.

The only constraint for write_seqlock is that there must not be any
concurrency.

But now that I look at this again, TJ, why can't the below happen?

	write_seqlock_begin();
	blk_mq_rq_update_state(rq, IN_FLIGHT);
	blk_add_timer(rq);
	<timer-irq>
		read_seqcount_begin()
			while (seq & 1)
				cpurelax();
		// life-lock
	</timer-irq>
	write_seqlock_end();
Hello Peter,

Some time ago the block layer was changed to handle timeouts in thread context
instead of interrupt context. See also commit 287922eb0b18 ("block: defer
timeouts to a workqueue").
That only makes it a little better:

	Task-A					Worker

	write_seqcount_begin()
	blk_mq_rw_update_state(rq, IN_FLIGHT)
	blk_add_timer(rq)
	<timer>
		schedule_work()
	</timer>
	<context-switch to worker>
						read_seqcount_begin()
							while(seq & 1)
								cpu_relax();
Hi Peter

The current seqcount read side is as below:
	do {
		start = read_seqcount_begin(&rq->gstate_seq);
		gstate = READ_ONCE(rq->gstate);
		deadline = rq->deadline;
	} while (read_seqcount_retry(&rq->gstate_seq, start));
read_seqcount_retry() doesn't check the bit 0, but whether the saved value from 
read_seqcount_begin() is equal to the current value of seqcount.
pls refer:
static inline int __read_seqcount_retry(const seqcount_t *s, unsigned start)
{
	return unlikely(s->sequence != start);
}

Thanks
Jianchao
Now normally this isn't fatal because Worker will simply spin its entire
time slice away and we'll eventually schedule our Task-A back in, which
will complete the seqcount and things will work.

But if, for some reason, our Worker was to have RT priority higher than
our Task-A we'd be up some creek without no paddles.

We don't happen to have preemption of IRQs off here? That would fix
things nicely.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help