Re: [PATCH] update bucket_in_use in real time
From: Coly Li <hidden>
Date: 2017-10-24 13:12:23
Also in:
linux-bcache
On 2017/10/24 下午4:57, tang.junhui@zte.com.cn wrote:
From: Tang Junhui <redacted> bucket_in_use is updated in gc thread which triggered by invalidating or writing sectors_to_gc dirty data, It's a long interval. Therefore, when we use it to compare with the threshold, it is often not timely, which leads to inaccurate judgment and often results in bucket depletion. We have send a patch before, by the means of updating bucket_in_use periodically In gc thread, which Coly thought that would lead high latency, In this patch, we add avail_nbuckets to record the count of available buckets, and we calculate bucket_in_use when alloc or free bucket in real time. Signed-off-by: Tang Junhui <redacted>
Hi Junhui, Thanks for the patch, this is a start to solve gc latency. Please check my comments in line.
quoted hunk ↗ jump to hunk
--- drivers/md/bcache/alloc.c | 10 ++++++++++ drivers/md/bcache/bcache.h | 1 + drivers/md/bcache/btree.c | 17 ++++++++++------- drivers/md/bcache/btree.h | 1 + 4 files changed, 22 insertions(+), 7 deletions(-) mode change 100644 => 100755 drivers/md/bcache/alloc.c mode change 100644 => 100755 drivers/md/bcache/bcache.h mode change 100644 => 100755 drivers/md/bcache/btree.c mode change 100644 => 100755 drivers/md/bcache/btree.hdiff --git a/drivers/md/bcache/alloc.c b/drivers/md/bcache/alloc.c old mode 100644 new mode 100755 index ca4abe1..89a5e35 --- a/drivers/md/bcache/alloc.c +++ b/drivers/md/bcache/alloc.c@@ -439,6 +439,11 @@ long bch_bucket_alloc(struct cache *ca, unsigned reserve, bool wait) b->prio = INITIAL_PRIO; } + if(ca->set->avail_nbuckets > 0) { + ca->set->avail_nbuckets--; + bch_update_bucket_in_use(ca->set); + } +
ca->set->avai_nuckets-- is a read-modify-update operation, which means
concurrent update will overwrite value update, imaging,
in-memory thread 1 thread 2
n = 4
v = n n = 4 v <- 4
n = 4 v <- 4
v-- n = 4 v <- 3
n = 4 v <- 3
n = v n = 3 n <- 3
n = 3 n <- 3
n = 3
In this case a correct n value should be 2, but n value in memory is 3.
Use atomic routine atomc_dec() will solve such issue.
quoted hunk ↗ jump to hunk
return r; }@@ -446,6 +451,11 @@ void __bch_bucket_free(struct cache *ca, struct bucket *b) { SET_GC_MARK(b, 0); SET_GC_SECTORS_USED(b, 0); + + if(ca->set->avail_nbuckets < ca->set->nbuckets) { + ca->set->avail_nbuckets++; + bch_update_bucket_in_use(ca->set); + }
Here we should use atomic_inc() to avail_nbuckets, the reason is similar to bch_bucket_alloc().
quoted hunk ↗ jump to hunk
} void bch_bucket_free(struct cache_set *c, struct bkey *k)diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h old mode 100644 new mode 100755 index dee542f..275b29c --- a/drivers/md/bcache/bcache.h +++ b/drivers/md/bcache/bcache.h@@ -580,6 +580,7 @@ struct cache_set { uint8_t need_gc; struct gc_stat gc_stats; size_t nbuckets; + size_t avail_nbuckets;
Here avail_nbuckets should be atomic_t.
quoted hunk ↗ jump to hunk
struct task_struct *gc_thread; /* Where in the btree gc currently is */diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c old mode 100644 new mode 100755 index 866dcf7..1ccf0c3 --- a/drivers/md/bcache/btree.c +++ b/drivers/md/bcache/btree.c@@ -1240,6 +1240,11 @@ void bch_initial_mark_key(struct cache_set *c, int level, struct bkey *k) __bch_btree_mark_key(c, level, k); } +void bch_update_bucket_in_use(struct cache_set *c) +{ + c->gc_stats.in_use = (c->nbuckets - c->avail_nbuckets) * 100 / c->nbuckets; +}
Here should use atomic_read() to c->avail_nbuckets.
quoted hunk ↗ jump to hunk
+ static bool btree_gc_mark_node(struct btree *b, struct gc_stat *gc) { uint8_t stale = 0;@@ -1651,9 +1656,8 @@ static void btree_gc_start(struct cache_set *c) mutex_unlock(&c->bucket_lock); } -static size_t bch_btree_gc_finish(struct cache_set *c) +static void bch_btree_gc_finish(struct cache_set *c) { - size_t available = 0; struct bucket *b; struct cache *ca; unsigned i;@@ -1690,6 +1694,7 @@ static size_t bch_btree_gc_finish(struct cache_set *c) } rcu_read_unlock(); + c->avail_nbuckets = 0; for_each_cache(ca, c, i) { uint64_t *i;@@ -1711,18 +1716,16 @@ static size_t bch_btree_gc_finish(struct cache_set *c) BUG_ON(!GC_MARK(b) && GC_SECTORS_USED(b)); if (!GC_MARK(b) || GC_MARK(b) == GC_MARK_RECLAIMABLE) - available++; + c->avail_nbuckets++;
In bch_bucket_alloc(), avail_nbuckets is decreased and GC_MARK_RECLAIMABLE might be marked to bucket. Because c->avail_nbuckets is updated without c->bucket_lock protect, it is possible that c->avail_nbuckets is decreased in bch_buckets_alloc() and increased in bch_btree_gc_finish(), which results an inaccurate available buckets value.
} } mutex_unlock(&c->bucket_lock); - return available;
A solution is to still keep available++ in the above code, and use atomic_set(&c->avail_nbuckets, available) at the last line of this function. By this method, if a bucket is in a small window that just allocated (mark reclaimable) but not written (mark dirty), and bch_btree_gc_finish() is called, c->avail_nbuckets can be consistent by overwritten by atomic_set(&c->avail_nbuckets, available). Other wise we will lost 1 counter. How do you think about it ?
quoted hunk ↗ jump to hunk
} static void bch_btree_gc(struct cache_set *c) { int ret; - unsigned long available; struct gc_stat stats; struct closure writes; struct btree_op op;@@ -1745,14 +1748,14 @@ static void bch_btree_gc(struct cache_set *c) pr_warn("gc failed!"); } while (ret); - available = bch_btree_gc_finish(c); + bch_btree_gc_finish(c); wake_up_allocators(c); bch_time_stats_update(&c->btree_gc_time, start_time); stats.key_bytes *= sizeof(uint64_t); stats.data <<= 9; - stats.in_use = (c->nbuckets - available) * 100 / c->nbuckets; + stats.in_use = (c->nbuckets - c->avail_nbuckets) * 100 / c->nbuckets;
atomic_read to c->avail_nbuckets should be used here.
quoted hunk ↗ jump to hunk
memcpy(&c->gc_stats, &stats, sizeof(struct gc_stat)); trace_bcache_gc_end(c);diff --git a/drivers/md/bcache/btree.h b/drivers/md/bcache/btree.h old mode 100644 new mode 100755 index 73da1f5..dabcde8 --- a/drivers/md/bcache/btree.h +++ b/drivers/md/bcache/btree.h@@ -305,5 +305,6 @@ bool bch_keybuf_check_overlapping(struct keybuf *, struct bkey *, struct keybuf_key *bch_keybuf_next(struct keybuf *); struct keybuf_key *bch_keybuf_next_rescan(struct cache_set *, struct keybuf *, struct bkey *, keybuf_pred_fn *); +void bch_update_bucket_in_use(struct cache_set *c); #endif
Available nbuckets should be counted when a clean bucket is set to dirty. Fortunately I find once a bucket is new allocated, it will be dirty and no error condition happens between the bucket is allocated and dirtied. So it works to count available bucket numbers in bch_bucket_alloc() and bch_bucket_free(), it is ok to me at this moment. Thanks. Coly Li