Re: [RFC PATCH] btrfs: zoned: auto reclaim low mostly full block-groups first
From: Johannes Thumshirn <hidden>
Date: 2021-09-21 07:31:43
On 20/09/2021 17:50, David Sterba wrote:
On Tue, Sep 21, 2021 at 12:11:01AM +0900, Johannes Thumshirn wrote:quoted
Currently auto reclaim of unusable zones reclaims the block-groups in the order they have been added to the reclaim list. Sort the list so we have the block-groups with the least amount of bytes to preserve at the beginning before starting the garbage collection loop.Makes sense as an optimization.quoted
Signed-off-by: Johannes Thumshirn <redacted> --- fs/btrfs/block-group.c | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+)diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index 46fdef7bbe20..d90297fb99e1 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c@@ -1,5 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 +#include <linux/list_sort.h> + #include "misc.h" #include "ctree.h" #include "block-group.h"@@ -1486,6 +1488,21 @@ void btrfs_mark_bg_unused(struct btrfs_block_group *bg) spin_unlock(&fs_info->unused_bgs_lock); } +/* + * We want block groups with a low number of used bytes to be in the beginning + * of the list, so they will get reclaimed first. + */ +static int reclaim_bgs_cmp(void *unused, const struct list_head *a, + const struct list_head *b) +{ + const struct btrfs_block_group *bg1, *bg2; + + bg1 = list_entry(a, struct btrfs_block_group, bg_list); + bg2 = list_entry(b, struct btrfs_block_group, bg_list); + + return bg1->used - bg2->used; +} + void btrfs_reclaim_bgs_work(struct work_struct *work) { struct btrfs_fs_info *fs_info =@@ -1510,6 +1527,7 @@ void btrfs_reclaim_bgs_work(struct work_struct *work) } spin_lock(&fs_info->unused_bgs_lock); + list_sort(NULL, &fs_info->reclaim_bgs, reclaim_bgs_cmp);The sort is under a spinlock, though it's probably not a highly contended lock, I think we should try to move it outside. Something like lock() list_splice_init(&splice, &reclaim_bgs) unlock() list_sort(&splice); while (!list_empty(splice)) { } We already use splice in the again_list so it could build on top of it. OTOH, it may not be absolutelly necessary to do the sort outside of the lock but rather because as a matter of good programming hygiene to not introduce unnecessary delays due to contended lock here and there that could potentially cascade further.
I'm expecting the number of entries in the list to be in the single or two digit range, so the sort should be rather quick. But I agree using a spliced list looks more future proof.