Re: [PATCH v2 2/2] btrfs: zoned: automatically reclaim zones

From: Naohiro Aota <naohiro.aota@wdc.com>
Date: 2021-03-23 07:11:50

On Fri, Mar 19, 2021 at 01:59:02PM -0400, Josef Bacik wrote:

On 3/19/21 6:48 AM, Johannes Thumshirn wrote:

quoted

When a file gets deleted on a zoned file system, the space freed is not
returned back into the block group's free space, but is migrated to
zone_unusable.

As this zone_unusable space is behind the current write pointer it is not
possible to use it for new allocations. In the current implementation a
zone is reset once all of the block group's space is accounted as zone
unusable.

This behaviour can lead to premature ENOSPC errors on a busy file system.

Instead of only reclaiming the zone once it is completely unusable,
kick off a reclaim job once the amount of unusable bytes exceeds a user
configurable threshold between 51% and 100%. It can be set per mounted
filesystem via the sysfs tunable bg_reclaim_threshold which is set to 75%
per default.

Similar to reclaiming unused block groups, these dirty block groups are
added to a to_reclaim list and then on a transaction commit, the reclaim
process is triggered but after we deleted unused block groups, which will
free space for the relocation process.

Signed-off-by: Johannes Thumshirn <redacted>
---

AFAICT sysfs_create_files() does not have the ability to provide a is_visible
callback, so the bg_reclaim_threshold sysfs file is visible for non zoned
filesystems as well, even though only for zoned filesystems we're adding block
groups to the reclaim list. I'm all ears for a approach that is sensible in
this regard.


  fs/btrfs/block-group.c       | 84 ++++++++++++++++++++++++++++++++++++
  fs/btrfs/block-group.h       |  2 +
  fs/btrfs/ctree.h             |  3 ++
  fs/btrfs/disk-io.c           | 11 +++++
  fs/btrfs/free-space-cache.c  |  9 +++-
  fs/btrfs/sysfs.c             | 35 +++++++++++++++
  fs/btrfs/volumes.c           |  2 +-
  fs/btrfs/volumes.h           |  1 +
  include/trace/events/btrfs.h | 12 ++++++
  9 files changed, 157 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index 9ae3ac96a521..af9026795ddd 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c

@@ -1485,6 +1485,80 @@ void btrfs_mark_bg_unused(struct btrfs_block_group *bg)
  	spin_unlock(&fs_info->unused_bgs_lock);
  }
+void btrfs_reclaim_bgs(struct btrfs_fs_info *fs_info)
+{
+	struct btrfs_block_group *bg;
+	struct btrfs_space_info *space_info;
+	int ret = 0;
+
+	if (!test_bit(BTRFS_FS_OPEN, &fs_info->flags))
+		return;
+
+	if (!btrfs_exclop_start(fs_info, BTRFS_EXCLOP_BALANCE))
+		return;
+
+	mutex_lock(&fs_info->reclaim_bgs_lock);
+	while (!list_empty(&fs_info->reclaim_bgs)) {
+		bg = list_first_entry(&fs_info->reclaim_bgs,
+				      struct btrfs_block_group,
+				      bg_list);
+		list_del_init(&bg->bg_list);
+
+		space_info = bg->space_info;
+		mutex_unlock(&fs_info->reclaim_bgs_lock);
+
+		/* Don't want to race with allocators so take the groups_sem */
+		down_write(&space_info->groups_sem);
+
+		spin_lock(&bg->lock);
+		if (bg->reserved || bg->pinned || bg->ro) {

How do we deal with backup supers in zoned again?  Will they show up as
readonly?  If so we may not want the bg->ro check, but I may be insane.

No superblock/backups are placed into a zone composing a block group,
because, if placed, it becomes a hole blocking sequential writes. The
zones containing superblock/backups are reserved and no device extents
are allocated there. So, bg->ro == 0, if the block group is read-write.

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help