Thread (10 messages) 10 messages, 4 authors, 2021-08-16

Re: [PATCH 2/2] btrfs: subpage: pack all subpage bitmaps into a larger bitmap

From: Qu Wenruo <hidden>
Date: 2021-08-16 13:42:17


On 2021/8/16 下午6:26, Nikolay Borisov wrote:

On 16.08.21 г. 9:00, Qu Wenruo wrote:
quoted
Currently we use u16 bitmap to make 4k sectorsize work for 64K page
size.

But this u16 bitmap is not large enough to contain larger page size like
128K, nor is space efficient for 16K page size.

To handle both cases, here we pack all subpage bitmaps into a larger
bitmap, now btrfs_subpage::bitmaps[] will be the ultimate bitmap for
subpage usage.

Each sub-bitmap will has its start bit number recorded in
btrfs_subpage_info::*_start, and its bitmap length will be recorded in
btrfs_subpage_info::bitmap_nr_bits.

All subpage bitmap operations will be converted from using direct u16
operations to bitmap operations, with above *_start calculated.

For 64K page size with 4K sectorsize, this should not cause much difference.
While for 16K page size, we will only need 1 unsigned long (u32) to
restore the bitmap.
And will be able to support 128K page size in the future.

Signed-off-by: Qu Wenruo <redacted>
---
  fs/btrfs/extent_io.c |  58 ++++++++++--------
  fs/btrfs/subpage.c   | 137 +++++++++++++++++++++++++++++--------------
  fs/btrfs/subpage.h   |  19 +-----
  3 files changed, 130 insertions(+), 84 deletions(-)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 543f87ea372e..e428d6208bb7 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3865,10 +3865,9 @@ static void find_next_dirty_byte(struct btrfs_fs_info *fs_info,
  	struct btrfs_subpage *subpage = (struct btrfs_subpage *)page->private;
  	u64 orig_start = *start;
  	/* Declare as unsigned long so we can use bitmap ops */
-	unsigned long dirty_bitmap;
  	unsigned long flags;
-	int nbits = (orig_start - page_offset(page)) >> fs_info->sectorsize_bits;
-	int range_start_bit = nbits;
+	int nbits = offset_in_page(orig_start) >> fs_info->sectorsize_bits;
nbits is rather dubious, by reading it I'd expect it to hold a
count/number of bits. In fact it holds a sector index. A better name
would be sector_index or block_index.
Not sure if "sector" in the context of bitmap is preferred.

But since nbits is only used once, deleting nbits completely would be
better.
quoted
+	int range_start_bit = nbits + fs_info->subpage_info->dirty_start;
  	int range_end_bit;

  	/*
@@ -3883,11 +3882,14 @@ static void find_next_dirty_byte(struct btrfs_fs_info *fs_info,

  	/* We should have the page locked, but just in case */
  	spin_lock_irqsave(&subpage->lock, flags);
-	dirty_bitmap = subpage->dirty_bitmap;
+	bitmap_next_set_region(subpage->bitmaps, &range_start_bit, &range_end_bit,
+			       fs_info->subpage_info->dirty_start +
+			       fs_info->subpage_info->bitmap_nr_bits);
  	spin_unlock_irqrestore(&subpage->lock, flags);

-	bitmap_next_set_region(&dirty_bitmap, &range_start_bit, &range_end_bit,
-			       BTRFS_SUBPAGE_BITMAP_SIZE);
+	range_start_bit -= fs_info->subpage_info->dirty_start;
+	range_end_bit -= fs_info->subpage_info->dirty_start;
+
  	*start = page_offset(page) + range_start_bit * fs_info->sectorsize;
  	*end = page_offset(page) + range_end_bit * fs_info->sectorsize;
  }
@@ -4613,7 +4615,7 @@ static int submit_eb_subpage(struct page *page,
  	int submitted = 0;
  	u64 page_start = page_offset(page);
  	int bit_start = 0;
-	const int nbits = BTRFS_SUBPAGE_BITMAP_SIZE;
+	const int nbits = fs_info->subpage_info->bitmap_nr_bits;
nit: do we really need nbits, given that
fs_info->subpage_info->bitmap_nr_bits fits on one line inside the while
expression. Also see the discrepancy - nbits here indeed holds a "number
of bits value".
Yeah, deleteing nbits completely here would be better.
quoted
  	int sectors_per_node = fs_info->nodesize >> fs_info->sectorsize_bits;
  	int ret;
<snip>
quoted
@@ -7178,32 +7181,41 @@ void memmove_extent_buffer(const struct extent_buffer *dst,
  	}
  }

+#define GANG_LOOKUP_SIZE	16
  static struct extent_buffer *get_next_extent_buffer(
  		struct btrfs_fs_info *fs_info, struct page *page, u64 bytenr)
  {
-	struct extent_buffer *gang[BTRFS_SUBPAGE_BITMAP_SIZE];
+	struct extent_buffer *gang[GANG_LOOKUP_SIZE];
  	struct extent_buffer *found = NULL;
  	u64 page_start = page_offset(page);
-	int ret;
-	int i;
+	u64 cur = page_start;

  	ASSERT(in_range(bytenr, page_start, PAGE_SIZE));
-	ASSERT(PAGE_SIZE / fs_info->nodesize <= BTRFS_SUBPAGE_BITMAP_SIZE);
  	lockdep_assert_held(&fs_info->buffer_lock);

-	ret = radix_tree_gang_lookup(&fs_info->buffer_radix, (void **)gang,
-			bytenr >> fs_info->sectorsize_bits,
-			PAGE_SIZE / fs_info->nodesize);
-	for (i = 0; i < ret; i++) {
-		/* Already beyond page end */
-		if (gang[i]->start >= page_start + PAGE_SIZE)
-			break;
-		/* Found one */
-		if (gang[i]->start >= bytenr) {
-			found = gang[i];
-			break;
+	while (cur < page_start + PAGE_SIZE) {
+		int ret;
+		int i;
+
+		ret = radix_tree_gang_lookup(&fs_info->buffer_radix,
+				(void **)gang, cur >> fs_info->sectorsize_bits,
+				min_t(unsigned int, GANG_LOOKUP_SIZE,
+				      PAGE_SIZE / fs_info->nodesize));
+		if (ret == 0)
+			goto out;
+		for (i = 0; i < ret; i++) {
+			/* Already beyond page end */
+			if (gang[i]->start >= page_start + PAGE_SIZE)
+				goto out;
+			/* Found one */
+			if (gang[i]->start >= bytenr) {
+				found = gang[i];
+				break;
Instead of break, don't you want to do got out? Break would exit from
the inner for() loop and continue at the outter while loop.
Damn, why my subpage tests didn't catch the problem...

I guess since the function is only called in
try_release_subpage_extent_buffer(), and it only needs to exhaust all
the ebs of the page, it doesn't really care about if we returned some
wrong eb.

But still, it should be "goto out".

Thanks for catching this one.
quoted
+			}
  		}
+		cur = gang[ret - 1]->start + gang[ret - 1]->len;
  	}
+out:
  	return found;
  }
diff --git a/fs/btrfs/subpage.c b/fs/btrfs/subpage.c
index 014256d47beb..9d6da9f2d77e 100644
--- a/fs/btrfs/subpage.c
+++ b/fs/btrfs/subpage.c
@@ -139,10 +139,14 @@ int btrfs_alloc_subpage(const struct btrfs_fs_info *fs_info,
  			struct btrfs_subpage **ret,
  			enum btrfs_subpage_type type)
  {
+	unsigned int real_size;
+
  	if (fs_info->sectorsize == PAGE_SIZE)
  		return 0;
offtopic: Instead of having a bunch of those checks can't we replace
them with ASSERTS and ensure that the decision whether we do subpage or
not is taken at a higher level ?
Nope, in this particular call site, btrfs_alloc_subpage() can be called
with regular page size.

I guess it's better to rename this function like btrfs_prepare_page()?
quoted
-	*ret = kzalloc(sizeof(struct btrfs_subpage), GFP_NOFS);
+	real_size = BITS_TO_LONGS(fs_info->subpage_info->total_nr_bits) *
+		    sizeof(unsigned long) + sizeof(struct btrfs_subpage);
real_size = struct_size(*ret, bitmaps,
BITS_TO_LONGS(fs_info->subpage_info->total_nr_bits));

And the calling convention for this function is awful. Just make it
return struct btrfs_subpage * and kill the dreadful **ret....
Above two advices indeed sound much better.
quoted
+	*ret = kzalloc(real_size, GFP_NOFS);
  	if (!*ret)
  		return -ENOMEM;
  	spin_lock_init(&(*ret)->lock);
@@ -324,37 +328,60 @@ void btrfs_page_end_writer_lock(const struct btrfs_fs_info *fs_info,
  		unlock_page(page);
  }

-/*
- * Convert the [start, start + len) range into a u16 bitmap
- *
- * For example: if start == page_offset() + 16K, len = 16K, we get 0x00f0.
- */
-static u16 btrfs_subpage_calc_bitmap(const struct btrfs_fs_info *fs_info,
-		struct page *page, u64 start, u32 len)
+static bool bitmap_test_range_all_set(unsigned long *addr, unsigned start,
+				      unsigned int nbits)
  {
-	const int bit_start = offset_in_page(start) >> fs_info->sectorsize_bits;
-	const int nbits = len >> fs_info->sectorsize_bits;
+	unsigned int found_zero;

-	btrfs_subpage_assert(fs_info, page, start, len);
+	found_zero = find_next_zero_bit(addr, start + nbits, start);
2 argument of find_next_zero_bit is 'size' which would be nbits as it
expects the size to be in bits , not start + nbit. Every logical bitmap
in ->bitmaps is defined by [start, nbits] no ? Unfortunately there is a
discrepancy between the order of documentation and the order of actual
arguments in the definition of this function....
IT'S A TRAP!

Paramater 2 (@size) is the total size of the search range, it should be
larger than the 3rd parameter.

You can check the _find_next_bit(), where there are a lot of @start >=
@nbits checks.

I guess this is the tricky part when we pack the bitmaps into one larger
bitmap...
quoted
+	if (found_zero == start + nbits)
+		return true;
+	return false;
+}

-	/*
-	 * Here nbits can be 16, thus can go beyond u16 range. We make the
-	 * first left shift to be calculate in unsigned long (at least u32),
-	 * then truncate the result to u16.
-	 */
-	return (u16)(((1UL << nbits) - 1) << bit_start);
+static bool bitmap_test_range_all_zero(unsigned long *addr, unsigned start,
+				       unsigned int nbits)
+{
+	unsigned int found_set;
+
+	found_set = find_next_bit(addr, start + nbits, start);
+	if (found_set == start + nbits)
+		return true;
+	return false;
This can be implemented by simply doing !bitmap_test_range_all_set no
need to have 2 separate implementations. Given those 2 functions are
only used in the definition oof the subpage_test_bitmap* macros I'da say
remove one of them and simply code the macros as
bitmap_test_range_all_set and !bitmap_test_range_all_set respectively.
TRAP CARD!

bitmap_test_range_all_set() will only return 1 if all bits in the range
are 1.

But bitmap_test_range_all_zero() should only return 1 if all bits in the
range are 0.

What if only part of the range is 1?

Thanks,
Qu

quoted
  }
<snip>
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help