Re: [PATCH] bcache: smooth writeback rate control

From: Coly Li <hidden>
Date: 2017-09-21 19:20:57

On 2017/9/20 下午11:50, Michael Lyle wrote:

On Wed, Sep 20, 2017 at 2:07 PM, Coly Li [off-list ref] wrote:

quoted

I get your point here. Current code sets minimum writeback rate to 1
sector, and indeed it writes 4K. But parameter "done" sent to
bch_next_delay() is 1 sector, indeed the correct value should be 8 sectors.

Not quite.  The value sent to bch_next_delay is 8 sectors.  So then
the code wants to sleep 8 seconds, but it is only willing to sleep at
most one second.  In other words, when rate is low, the controller
becomes ineffective and does one write of any size no matter what.
This keeps the controller effective at low target rates-- if it does a
minimally-sized write, it will sleep one second like before-- if it
writes more, it will sleep longer.

Hi Mike,

Got it. Now I understand why you set dc->writeback_rate_minimum	to 8,
it's reasonable and thoughtful. I agree with you. Thanks for your patent
explaining.

quoted

[snip]
I see. This is a situation I missed in my environment. On my desktop
machine they are all SSD. My testing server is too noisy and overwhelm
groan of my hard disks. Thanks for your information :-)

I have one more question for this line "When writing back faster, they
are quieter.  This will at least do it 2.5x less often."

Current bcache code indeed writes back 4K per-second at minimum
writeback rate if I understand correctly. Your patch makes the minimum
writeback rate to 4K every 2.5 seconds, it makes writing back slower not
faster. Do I miss something here ?

It is still doing the bad thing, but only 40% of the bad thing and
hopefully will wear out the disk slower.  I want to make further
changes to do it even less (and let disks spin down).

The value of 2.5 seconds is chosen because it is half the speed that
we recalculate the target rate, by default.  If the rate increases, we
won't be sleeping too long before we start writing faster.  (i.e. our
typical reaction time is now 5 + 2.5/2 = 6.25 seconds, instead of 5 +
1/2 = 5.5 seconds)

OK I see. So the interval for idle time writeback will be longer, but
not too long. Interesting, you want it longer, I want it shorter :-)
Don't worry I will solve the conflict from my side.

quoted

In a subsequent patch, there's additional things I'd like to do-- like
be willing to do no write after a wakeup if we are very far ahead.
That is, to allow the effective value to be much larger than 2.5
seconds.  This would potentially allow spindown on laptops, etc.  But
this change is still worthwhile on its own.

OK, looking forward to the subsequent patch :-)

This one is a first step just because it's very simple and
self-contained.  The next step on this path is a bit more difficult,
because it requires rewriting bch_next_delay to store the reciprocal
of what it currently does and bounding both the amount it can be
ahead, and the sleep interval.

Then please make this patch as simple/understandable as possible. Bcache
is not deserved for a very complicated delay calculation policy. If you
remember the discussions years ago for memory readahead and writeback,
you may find always simple things win.

quoted

Another thing that would be helpful would be to issue more than 1
write at a time, so that queue depth doesn't always equal 1.  Queue
depth=4 has about 40-50% more throughput--- that is, it completes the
4 IOPs in about 2.5x the time of one-- when writes are clustered to
the same region of the disk/short-stroked.  However this has a
potential cost in latency, so it needs to be carefully traded off.

It makes sense. But "dc->writeback_rate_minimum = 8" is one 4KB I/O,
then your patch is to make minimum I/O slower x2.5 times, does not
change I/O size. So this patch just makes writeback I/O less frequently,
not increase number of I/Os for each writeback ?

If writeback is doing minimum-sized I/Os with a minimum target rate,
they will happen at the same rate as before.  If what it finds from
the dirty keybuf is a bigger-than-minimum sized I/O, it will sleep
longer.  e.g. if it finds a contiguous 8k block, it will now sleep 2
seconds (and will achieve the target 4k/second as a result).  if it
finds a contiguous >=10k block, it will sleep the maximum 2.5 seconds.
This keeps the controller doing something meaningful even when stuff
being written is larger than the target amount written in one second.

Note that it has an effect even when the rate is above the minimum.
If we're trying to write 1MB/second, and we find a 1.5MB continguous
block to write, we'll wait 1.5 seconds.

It is clear to me. It's a good discussion, make things more clear and
understandable for me and other people.

Could you please add some important information from our discussion to
your patch commit log ? I do appreciate for this, then more people will
get to know how and why you design a P.I controller in this way, this is
valuable.

Now I know the patch is good, and have no comment for it. Thank you for
your informative response.

-- 
Coly Li

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help