Re: [PATCH v2 06/10] ext4: update delalloc data reserve spcae in ext4_es_insert_extent()
From: Zhang Yi <hidden>
Date: 2024-08-09 03:35:55
Also in:
linux-fsdevel, lkml
On 2024/8/9 2:36, Jan Kara wrote:
On Thu 08-08-24 19:18:30, Zhang Yi wrote:quoted
On 2024/8/8 1:41, Jan Kara wrote:quoted
On Fri 02-08-24 19:51:16, Zhang Yi wrote:quoted
From: Zhang Yi <yi.zhang@huawei.com> Now that we update data reserved space for delalloc after allocating new blocks in ext4_{ind|ext}_map_blocks(), and if bigalloc feature is enabled, we also need to query the extents_status tree to calculate the exact reserved clusters. This is complicated now and it appears that it's better to do this job in ext4_es_insert_extent(), because __es_remove_extent() have already count delalloc blocks when removing delalloc extents and __revise_pending() return new adding pending count, we could update the reserved blocks easily in ext4_es_insert_extent(). Thers is one special case needs to concern is the quota claiming, when bigalloc is enabled, if the delayed cluster allocation has been raced by another no-delayed allocation(e.g. from fallocate) which doesn't cover the delayed blocks: |< one cluster >| hhhhhhhhhhhhhhhhhhhdddddddddd ^ ^ |< >| < fallocate this range, don't claim quota again We can't claim quota as usual because the fallocate has already claimed it in ext4_mb_new_blocks(), we could notice this case through the removed delalloc blocks count. Signed-off-by: Zhang Yi <yi.zhang@huawei.com>...quoted
@@ -926,9 +928,27 @@ void ext4_es_insert_extent(struct inode *inode, ext4_lblk_t lblk, __free_pending(pr); pr = NULL; } + pending = err3; } error: write_unlock(&EXT4_I(inode)->i_es_lock); + /* + * Reduce the reserved cluster count to reflect successful deferred + * allocation of delayed allocated clusters or direct allocation of + * clusters discovered to be delayed allocated. Once allocated, a + * cluster is not included in the reserved count. + * + * When bigalloc is enabled, allocating non-delayed allocated blocks + * which belong to delayed allocated clusters (from fallocate, filemap, + * DIO, or clusters allocated when delalloc has been disabled by + * ext4_nonda_switch()). Quota has been claimed by ext4_mb_new_blocks(), + * so release the quota reservations made for any previously delayed + * allocated clusters. + */ + resv_used = rinfo.delonly_cluster + pending; + if (resv_used) + ext4_da_update_reserve_space(inode, resv_used, + rinfo.delonly_block);I'm not sure I understand here. We are inserting extent into extent status tree. We are replacing resv_used clusters worth of space with delayed allocation reservation with normally allocated clusters so we need to release the reservation (mballoc already reduced freeclusters counter). That I understand. In normal case we should also claim quota because we are converting from reserved into allocated state. Now if we allocated blocks under this range (e.g. from fallocate()) without EXT4_GET_BLOCKS_DELALLOC_RESERVE, we need to release quota reservation here instead of claiming it. But I fail to see how rinfo.delonly_block > 0 is related to whether EXT4_GET_BLOCKS_DELALLOC_RESERVE was set when allocating blocks for this extent or not.Oh, this is really complicated due to the bigalloc feature, please let me explain it more clearly by listing all related situations. There are 2 types of paths of allocating delayed/reserved cluster: 1. Normal case, normally allocate delayed clusters from the write back path. 2. Special case, allocate blocks under this delayed range, e.g. from fallocate(). There are 4 situations below: A. bigalloc is disabled. This case is simple, after path 2, we don't need to distinguish path 1 and 2, when calling ext4_es_insert_extent(), we set EXT4_GET_BLOCKS_DELALLOC_RESERVE after EXT4_MAP_DELAYED bit is detected. If the flag is set, we must be replacing a delayed extent and rinfo.delonly_block must be > 0. So rinfo.delonly_block > 0 is equal to set EXT4_GET_BLOCKS_DELALLOC_RESERVE.Right. So fallocate() will call ext4_map_blocks() and ext4_es_lookup_extent() will find delayed extent and set EXT4_MAP_DELAYED which you (due to patch 2 of this series) transform into EXT4_GET_BLOCKS_DELALLOC_RESERVE. We used to update the delalloc accounting through in ext4_ext_map_blocks() but this patch moved the update to ext4_es_insert_extent(). But there is one cornercase even here AFAICT: Suppose fallocate is called for range 0..16k, we have delalloc extent at 8k..16k. In this case ext4_map_blocks() at block 0 will not find the delalloc extent but ext4_ext_map_blocks() will allocate 16k from mballoc without using delalloc reservation but then ext4_es_insert_extent() will still have rinfo.delonly_block > 0 so we claim the quota reservation instead of releasing it?
After commit 6430dea07e85 ("ext4: correct the hole length returned by
ext4_map_blocks()"), the fallocate range 0-16K would be divided into two
rounds. When we first calling ext4_map_blocks() with 0-16K, the map range
will be corrected to 0-8k by ext4_ext_determine_insert_hole() and the
allocating range should not cover any delayed range. Then
ext4_alloc_file_blocks() will call ext4_map_blocks() again to allocate
8K-16K in the second round, in this round, we are allocating a real
delayed range. Please below graph for details,
ext4_alloc_file_blocks() //0-16K
ext4_map_blocks() //0-16K
ext4_es_lookup_extent() //find nothing
ext4_ext_map_blocks(0)
ext4_ext_determine_insert_hole() //change map range to 0-8K
ext4_ext_map_blocks(EXT4_GET_BLOCKS_CREATE) //allocate blocks under hole
ext4_map_blocks() //8-16K
ext4_es_lookup_extent() //find delayed extent
ext4_ext_map_blocks(EXT4_GET_BLOCKS_CREATE)
//allocate blocks under a whole delayed range,
//use rinfo.delonly_block > 0 is okay
Hence the allocating range can't mixed with delayed and non-delayed extent
at a time, and the rinfo.delonly_block > 0 should work.
quoted
B. bigalloc is enabled, there a 3 sub-cases of allocating a delayed cluster: B0.Allocating a whole delayed cluster, this case is the same to A. |< one cluster >| ddddddd+ddddddd+ddddddd+ddddddd ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ allocating the whole rangeI agree. In this case there's no difference.quoted
B1.Allocating delayed blocks in a reserved cluster, this case is the same to A, too. |< one cluster >| hhhhhhh+hhhhhhh+ddddddd+ddddddd ^^^^^^^ allocating this rangeYes, if the allocation starts within delalloc range, we will have EXT4_GET_BLOCKS_DELALLOC_RESERVE set and ndelonly_blocks will always be > 0.quoted
B2.Allocating blocks which doesn't cover the delayed blocks in one reserved cluster, |< one cluster >| hhhhhhh+hhhhhhh+hhhhhhh+ddddddd ^^^^^^^ fallocating this range This case must from path 2, which means allocating blocks without EXT4_GET_BLOCKS_DELALLOC_RESERVE. In this case, rinfo.delonly_block must be 0 since we are not replacing any delayed extents, so rinfo.delonly_block == 0 means allocate blocks without EXT4_MAP_DELAYED detected, which further means that EXT4_GET_BLOCKS_DELALLOC_RESERVE is not set. So I think we could use rinfo.delonly_block to identify this case.Well, this is similar to the non-bigalloc case I was asking about above. Why the allocated unwritten extent cannot extend past the start of delalloc extent? I didn't find anything that would disallow that...
The same to above, ext4_ext_determine_insert_hole() should work fine for this case. Thanks, Yi.