Thread (25 messages) 25 messages, 2 authors, 2024-08-10

Re: [PATCH v2 06/10] ext4: update delalloc data reserve spcae in ext4_es_insert_extent()

From: Zhang Yi <hidden>
Date: 2024-08-09 03:35:55
Also in: linux-fsdevel, lkml

On 2024/8/9 2:36, Jan Kara wrote:
On Thu 08-08-24 19:18:30, Zhang Yi wrote:
quoted
On 2024/8/8 1:41, Jan Kara wrote:
quoted
On Fri 02-08-24 19:51:16, Zhang Yi wrote:
quoted
From: Zhang Yi <yi.zhang@huawei.com>

Now that we update data reserved space for delalloc after allocating
new blocks in ext4_{ind|ext}_map_blocks(), and if bigalloc feature is
enabled, we also need to query the extents_status tree to calculate the
exact reserved clusters. This is complicated now and it appears that
it's better to do this job in ext4_es_insert_extent(), because
__es_remove_extent() have already count delalloc blocks when removing
delalloc extents and __revise_pending() return new adding pending count,
we could update the reserved blocks easily in ext4_es_insert_extent().

Thers is one special case needs to concern is the quota claiming, when
bigalloc is enabled, if the delayed cluster allocation has been raced
by another no-delayed allocation(e.g. from fallocate) which doesn't
cover the delayed blocks:

  |<       one cluster       >|
  hhhhhhhhhhhhhhhhhhhdddddddddd
  ^            ^
  |<          >| < fallocate this range, don't claim quota again

We can't claim quota as usual because the fallocate has already claimed
it in ext4_mb_new_blocks(), we could notice this case through the
removed delalloc blocks count.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
...
quoted
@@ -926,9 +928,27 @@ void ext4_es_insert_extent(struct inode *inode, ext4_lblk_t lblk,
 			__free_pending(pr);
 			pr = NULL;
 		}
+		pending = err3;
 	}
 error:
 	write_unlock(&EXT4_I(inode)->i_es_lock);
+	/*
+	 * Reduce the reserved cluster count to reflect successful deferred
+	 * allocation of delayed allocated clusters or direct allocation of
+	 * clusters discovered to be delayed allocated.  Once allocated, a
+	 * cluster is not included in the reserved count.
+	 *
+	 * When bigalloc is enabled, allocating non-delayed allocated blocks
+	 * which belong to delayed allocated clusters (from fallocate, filemap,
+	 * DIO, or clusters allocated when delalloc has been disabled by
+	 * ext4_nonda_switch()). Quota has been claimed by ext4_mb_new_blocks(),
+	 * so release the quota reservations made for any previously delayed
+	 * allocated clusters.
+	 */
+	resv_used = rinfo.delonly_cluster + pending;
+	if (resv_used)
+		ext4_da_update_reserve_space(inode, resv_used,
+					     rinfo.delonly_block);
I'm not sure I understand here. We are inserting extent into extent status
tree. We are replacing resv_used clusters worth of space with delayed
allocation reservation with normally allocated clusters so we need to
release the reservation (mballoc already reduced freeclusters counter).
That I understand. In normal case we should also claim quota because we are
converting from reserved into allocated state. Now if we allocated blocks
under this range (e.g. from fallocate()) without
EXT4_GET_BLOCKS_DELALLOC_RESERVE, we need to release quota reservation here
instead of claiming it. But I fail to see how rinfo.delonly_block > 0 is
related to whether EXT4_GET_BLOCKS_DELALLOC_RESERVE was set when allocating
blocks for this extent or not.
Oh, this is really complicated due to the bigalloc feature, please let me
explain it more clearly by listing all related situations.

There are 2 types of paths of allocating delayed/reserved cluster:
1. Normal case, normally allocate delayed clusters from the write back path.
2. Special case, allocate blocks under this delayed range, e.g. from
   fallocate().

There are 4 situations below:

A. bigalloc is disabled. This case is simple, after path 2, we don't need
   to distinguish path 1 and 2, when calling ext4_es_insert_extent(), we
   set EXT4_GET_BLOCKS_DELALLOC_RESERVE after EXT4_MAP_DELAYED bit is
   detected. If the flag is set, we must be replacing a delayed extent and
   rinfo.delonly_block must be > 0. So rinfo.delonly_block > 0 is equal
   to set EXT4_GET_BLOCKS_DELALLOC_RESERVE.
Right. So fallocate() will call ext4_map_blocks() and
ext4_es_lookup_extent() will find delayed extent and set EXT4_MAP_DELAYED
which you (due to patch 2 of this series) transform into
EXT4_GET_BLOCKS_DELALLOC_RESERVE. We used to update the delalloc
accounting through in ext4_ext_map_blocks() but this patch moved the update
to ext4_es_insert_extent(). But there is one cornercase even here AFAICT:

Suppose fallocate is called for range 0..16k, we have delalloc extent at
8k..16k. In this case ext4_map_blocks() at block 0 will not find the
delalloc extent but ext4_ext_map_blocks() will allocate 16k from mballoc
without using delalloc reservation but then ext4_es_insert_extent() will
still have rinfo.delonly_block > 0 so we claim the quota reservation
instead of releasing it?
After commit 6430dea07e85 ("ext4: correct the hole length returned by
ext4_map_blocks()"), the fallocate range 0-16K would be divided into two
rounds. When we first calling ext4_map_blocks() with 0-16K, the map range
will be corrected to 0-8k by ext4_ext_determine_insert_hole() and the
allocating range should not cover any delayed range. Then
ext4_alloc_file_blocks() will call ext4_map_blocks() again to allocate
8K-16K in the second round, in this round, we are allocating a real
delayed range. Please below graph for details,

ext4_alloc_file_blocks() //0-16K
 ext4_map_blocks()  //0-16K
  ext4_es_lookup_extent() //find nothing
   ext4_ext_map_blocks(0)
    ext4_ext_determine_insert_hole() //change map range to 0-8K
   ext4_ext_map_blocks(EXT4_GET_BLOCKS_CREATE) //allocate blocks under hole
 ext4_map_blocks()  //8-16K
  ext4_es_lookup_extent() //find delayed extent
  ext4_ext_map_blocks(EXT4_GET_BLOCKS_CREATE)
    //allocate blocks under a whole delayed range,
    //use rinfo.delonly_block > 0 is okay

Hence the allocating range can't mixed with delayed and non-delayed extent
at a time, and the rinfo.delonly_block > 0 should work.
quoted
B. bigalloc is enabled, there a 3 sub-cases of allocating a delayed
   cluster:
B0.Allocating a whole delayed cluster, this case is the same to A.

     |<         one cluster       >|
     ddddddd+ddddddd+ddddddd+ddddddd
     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ allocating the whole range
I agree. In this case there's no difference.

 
quoted
B1.Allocating delayed blocks in a reserved cluster, this case is the same
   to A, too.

     |<         one cluster       >|
     hhhhhhh+hhhhhhh+ddddddd+ddddddd
                             ^^^^^^^
                             allocating this range
Yes, if the allocation starts within delalloc range, we will have
EXT4_GET_BLOCKS_DELALLOC_RESERVE set and ndelonly_blocks will always be >
0.
quoted
B2.Allocating blocks which doesn't cover the delayed blocks in one reserved
   cluster,

     |<         one cluster       >|
     hhhhhhh+hhhhhhh+hhhhhhh+ddddddd
     ^^^^^^^
     fallocating this range

  This case must from path 2, which means allocating blocks without
  EXT4_GET_BLOCKS_DELALLOC_RESERVE. In this case, rinfo.delonly_block must
  be 0 since we are not replacing any delayed extents, so
  rinfo.delonly_block == 0 means allocate blocks without EXT4_MAP_DELAYED
  detected, which further means that EXT4_GET_BLOCKS_DELALLOC_RESERVE is
  not set. So I think we could use rinfo.delonly_block to identify this
  case.
Well, this is similar to the non-bigalloc case I was asking about above.
Why the allocated unwritten extent cannot extend past the start of delalloc
extent? I didn't find anything that would disallow that...
The same to above, ext4_ext_determine_insert_hole() should work fine for
this case.

Thanks,
Yi.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help