Re: [PATCH] bcache: smooth writeback rate control
From: Coly Li <hidden>
Date: 2017-09-21 19:20:57
On 2017/9/20 下午11:50, Michael Lyle wrote:
On Wed, Sep 20, 2017 at 2:07 PM, Coly Li [off-list ref] wrote:quoted
I get your point here. Current code sets minimum writeback rate to 1 sector, and indeed it writes 4K. But parameter "done" sent to bch_next_delay() is 1 sector, indeed the correct value should be 8 sectors.Not quite. The value sent to bch_next_delay is 8 sectors. So then the code wants to sleep 8 seconds, but it is only willing to sleep at most one second. In other words, when rate is low, the controller becomes ineffective and does one write of any size no matter what. This keeps the controller effective at low target rates-- if it does a minimally-sized write, it will sleep one second like before-- if it writes more, it will sleep longer.
Hi Mike, Got it. Now I understand why you set dc->writeback_rate_minimum to 8, it's reasonable and thoughtful. I agree with you. Thanks for your patent explaining.
quoted
[snip] I see. This is a situation I missed in my environment. On my desktop machine they are all SSD. My testing server is too noisy and overwhelm groan of my hard disks. Thanks for your information :-) I have one more question for this line "When writing back faster, they are quieter. This will at least do it 2.5x less often." Current bcache code indeed writes back 4K per-second at minimum writeback rate if I understand correctly. Your patch makes the minimum writeback rate to 4K every 2.5 seconds, it makes writing back slower not faster. Do I miss something here ?It is still doing the bad thing, but only 40% of the bad thing and hopefully will wear out the disk slower. I want to make further changes to do it even less (and let disks spin down). The value of 2.5 seconds is chosen because it is half the speed that we recalculate the target rate, by default. If the rate increases, we won't be sleeping too long before we start writing faster. (i.e. our typical reaction time is now 5 + 2.5/2 = 6.25 seconds, instead of 5 + 1/2 = 5.5 seconds)
OK I see. So the interval for idle time writeback will be longer, but not too long. Interesting, you want it longer, I want it shorter :-) Don't worry I will solve the conflict from my side.
quoted
quoted
In a subsequent patch, there's additional things I'd like to do-- like be willing to do no write after a wakeup if we are very far ahead. That is, to allow the effective value to be much larger than 2.5 seconds. This would potentially allow spindown on laptops, etc. But this change is still worthwhile on its own.OK, looking forward to the subsequent patch :-)This one is a first step just because it's very simple and self-contained. The next step on this path is a bit more difficult, because it requires rewriting bch_next_delay to store the reciprocal of what it currently does and bounding both the amount it can be ahead, and the sleep interval.
Then please make this patch as simple/understandable as possible. Bcache is not deserved for a very complicated delay calculation policy. If you remember the discussions years ago for memory readahead and writeback, you may find always simple things win.
quoted
quoted
Another thing that would be helpful would be to issue more than 1 write at a time, so that queue depth doesn't always equal 1. Queue depth=4 has about 40-50% more throughput--- that is, it completes the 4 IOPs in about 2.5x the time of one-- when writes are clustered to the same region of the disk/short-stroked. However this has a potential cost in latency, so it needs to be carefully traded off.It makes sense. But "dc->writeback_rate_minimum = 8" is one 4KB I/O, then your patch is to make minimum I/O slower x2.5 times, does not change I/O size. So this patch just makes writeback I/O less frequently, not increase number of I/Os for each writeback ?If writeback is doing minimum-sized I/Os with a minimum target rate, they will happen at the same rate as before. If what it finds from the dirty keybuf is a bigger-than-minimum sized I/O, it will sleep longer. e.g. if it finds a contiguous 8k block, it will now sleep 2 seconds (and will achieve the target 4k/second as a result). if it finds a contiguous >=10k block, it will sleep the maximum 2.5 seconds. This keeps the controller doing something meaningful even when stuff being written is larger than the target amount written in one second. Note that it has an effect even when the rate is above the minimum. If we're trying to write 1MB/second, and we find a 1.5MB continguous block to write, we'll wait 1.5 seconds.
It is clear to me. It's a good discussion, make things more clear and understandable for me and other people. Could you please add some important information from our discussion to your patch commit log ? I do appreciate for this, then more people will get to know how and why you design a P.I controller in this way, this is valuable. Now I know the patch is good, and have no comment for it. Thank you for your informative response. -- Coly Li