Re: Re: [PATCH] bcache: fix issue of writeback rate at minimum 1 blockper second
From: Eric Wheeler <hidden>
Date: 2017-06-20 21:36:56
On Sat, 17 Jun 2017, bcache@lists.ewheeler.net wrote:
On Fri, 9 Jun 2017, tang.junhui@zte.com.cn wrote:quoted
Hello Eric, > It would be nice if ZERO_RATE_DELAY_NS was configurable via sysfs, as some > might want a longer delay than 30s. How about we reuse the dc->writeback_delay configuration for the delay time?That seems reasonable. For small writeback caches where the cache could be exhausted while in schedule_timeout(), will IOs stall until schedule_timeout() allows writeback_thread to proceed?
Hi Tang, Also, I wonder, is schedule_timeout the right place to do this? You would delay the thread for 30 seconds (by default) even in the case of BCACHE_DEV_DETACHING if I am reading this right. This would make for a slow shutdown if it is waiting in schedule_timeout. Would it would be better to handle this by setting d->next = WRITEBACK_WAIT_CYCLE? After all, we are trying to avoid doing 1-secondly writes if the dirty data is within threshold. -- Eric Wheeler
quoted
> Also, if zero-rate has been hit, why not have a full-writeback of whatever > remains so we only use the ZERO_RATE_DELAY_NS when dirty_data==0? No > sense in leaving it slightly dirty if there is only a block or two still > remaining. I think it's risky, write IO maybe in continue, we should to accumulate more dirty dataok, agreed---also I note that writeback is still 10% dirty by default, so it could be a large amount of data. The cache is always seeking to hit dirty_ratio and this is what the 1-sector writeback fix is targetd for.quoted
to make writeback sequentially, otherwise, the full-writeback would be inefficient when there is only a little dirty, as it is small and random IOs, and the write IO are in continue. So, Eric, would you agree to just modify this patch to reuse dc->writeback_delay as the delay time?Yes that seems OK, unless someone else says otherwise.quoted
Acctually, we do many optimizations for writeback in our product, such as the issue that the dirty device never get clean (we start full-writeback when we find there is no more write IOs in comming), these modifications make bcache works better.I would be interested in seeing other optimization patches if you have any. Thank you for your work on this! -Ericquoted
Regards, Tang Junhui 原始邮件 发件人: <bcache@lists.ewheeler.net>; 收件人: <i@coly.li>; 抄送人:唐军辉10074136; <kent.overstreet@gmail.com>; <linux-bcache@vger.kernel.org>; 日 期 :2017年06月09日 09:43 主 题 :Re: [PATCH] bcache: fix issue of writeback rate at minimum 1 blockper second On Fri, 9 Jun 2017, Coly Li wrote: > On 2017/6/8 下午5:34, tang.junhui@zte.com.cn wrote: > > From: Tang Junhui <tang.junhui@zte.com.cn> > > > > When there is not enough data in writeback cache, let the writeback rate > > to be 0, and delay 30 seconds in read_dirty(). > > > > I feel uncomfortable to the above idea. Is the purpose of this change is > to make writeback more efficient with more dirty data ? Or some other > reason that you want to make this change? > > When data writing back to the backing device, most of the writes are > random. If the backing device is a hard disk, the write back rate will > be very slow. Such a longer delay only makes more data accumulated in > writeback cache, does not help the writeback rate indeed. > > Do you observe performance improvement by this patch ? If yes, I'd like > to see the performance number. Its not a performance issue, it is a power saving issue. Laptop users have reported that zero-rate writeback writes 512-bytes every second and never lets the HDD spin down and several users on the list have asked for such a bugfix. It would be nice if ZERO_RATE_DELAY_NS was configurable via sysfs, as some might want a longer delay than 30s. Also, if zero-rate has been hit, why not have a full-writeback of whatever remains so we only use the ZERO_RATE_DELAY_NS when dirty_data==0? No sense in leaving it slightly dirty if there is only a block or two still remaining. -- Eric Wheeler > > Thanks. > > Coly > > > > Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn> > > --- > > drivers/md/bcache/util.c | 17 ++++++++++++----- > > drivers/md/bcache/writeback.c | 2 +- > > 2 files changed, 13 insertions(+), 6 deletions(-) > > > > diff --git a/drivers/md/bcache/util.c b/drivers/md/bcache/util.c > > index dde6172..cf750df 100644 > > --- a/drivers/md/bcache/util.c > > +++ b/drivers/md/bcache/util.c > > @@ -15,6 +15,8 @@ > > > > #include "util.h" > > > > +#define ZERO_RATE_DELAY_NS 30*NSEC_PER_SEC > > + > > #define simple_strtoint(c, end, base) simple_strtol(c, end, base) > > #define simple_strtouint(c, end, base) simple_strtoul(c, end, base) > > > > @@ -209,13 +211,18 @@ uint64_t bch_next_delay(struct bch_ratelimit *d, uint64_t done) > > { > > uint64_t now = local_clock(); > > > > - d->next += div_u64(done * NSEC_PER_SEC, d->rate); > > + if (!d->rate) { > > + d->next = now + ZERO_RATE_DELAY_NS; > > + } > > + else { > > + d->next += div_u64(done * NSEC_PER_SEC, d->rate); > > > > - if (time_before64(now + NSEC_PER_SEC, d->next)) > > - d->next = now + NSEC_PER_SEC; > > + if (time_before64(now + NSEC_PER_SEC, d->next)) > > + d->next = now + NSEC_PER_SEC; > > > > - if (time_after64(now - NSEC_PER_SEC * 2, d->next)) > > - d->next = now - NSEC_PER_SEC * 2; > > + if (time_after64(now - NSEC_PER_SEC * 2, d->next)) > > + d->next = now - NSEC_PER_SEC * 2; > > + } > > > > return time_after64(d->next, now) > > ? div_u64(d->next - now, NSEC_PER_SEC / HZ) > > diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c > > index 69e1ae5..8fac280 100644 > > --- a/drivers/md/bcache/writeback.c > > +++ b/drivers/md/bcache/writeback.c > > @@ -60,7 +60,7 @@ static void __update_writeback_rate(struct cached_dev *dc) > > > > dc->writeback_rate.rate = > > clamp_t(int64_t, (int64_t) dc->writeback_rate.rate + change, > > - 1, NSEC_PER_MSEC); > > + 0, NSEC_PER_MSEC); > > > > dc->writeback_rate_proportional = proportional; > > dc->writeback_rate_derivative = derivative; > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-bcache" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >