Re: [RFC] btrfs: Allow read-only mount with corrupted extent tree
From: Dāvis Mosāns <hidden>
Date: 2021-03-21 21:56:08
sestd., 2021. g. 20. marts, plkst. 02:34 — lietotājs Qu Wenruo ([off-list ref]) rakstīja:
On 2021/3/19 下午11:34, Dāvis Mosāns wrote:quoted
ceturtd., 2021. g. 18. marts, plkst. 01:49 — lietotājs Qu Wenruo ([off-list ref]) rakstīja:quoted
On 2021/3/18 上午5:03, Dāvis Mosāns wrote:quoted
trešd., 2021. g. 17. marts, plkst. 12:28 — lietotājs Qu Wenruo ([off-list ref]) rakstīja:quoted
On 2021/3/17 上午9:29, Dāvis Mosāns wrote:quoted
trešd., 2021. g. 17. marts, plkst. 03:18 — lietotājs Dāvis Mosāns ([off-list ref]) rakstīja:quoted
Currently if there's any corruption at all in extent tree (eg. even single bit) then mounting will fail with: "failed to read block groups: -5" (-EIO) It happens because we immediately abort on first error when searching in extent tree for block groups. Now with this patch if `ignorebadroots` option is specified then we handle such case and continue by removing already created block groups and creating dummy block groups. Signed-off-by: Dāvis Mosāns <redacted> --- fs/btrfs/block-group.c | 14 ++++++++++++++ fs/btrfs/disk-io.c | 4 ++-- fs/btrfs/disk-io.h | 2 ++ 3 files changed, 18 insertions(+), 2 deletions(-)diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index 48ebc106a606..827a977614b3 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c@@ -2048,6 +2048,20 @@ int btrfs_read_block_groups(struct btrfs_fs_info *info) ret = check_chunk_block_group_mappings(info); error: btrfs_free_path(path); + + if (ret == -EIO && btrfs_test_opt(info, IGNOREBADROOTS)) { + btrfs_put_block_group_cache(info); + btrfs_stop_all_workers(info); + btrfs_free_block_groups(info); + ret = btrfs_init_workqueues(info, NULL); + if (ret) + return ret; + ret = btrfs_init_space_info(info); + if (ret) + return ret; + return fill_dummy_bgs(info);When we hit bad things in extent tree, we should ensure we're mounting the fs RO, or we can't continue. And we should also refuse to mount back to RW if we hit such case, so that we don't need anything complex, just ignore the whole extent tree and create the dummy block groups.That's what we're doing here, `ignorebadroots` implies RO mount and without specifying it doesn't mount at all.quoted
quoted
This isn't that nice, but I don't really know how to properly clean up everything related to already created block groups so this was easiest way. It seems to work fine. But looks like need to do something about replay log aswell because if it's not disabled then it fails with: [ 1397.246869] BTRFS info (device sde): start tree-log replay [ 1398.218685] BTRFS warning (device sde): sde checksum verify failed on 21057127661568 wanted 0xd1506ed9 found 0x22ab750a level 0 [ 1398.218803] BTRFS warning (device sde): sde checksum verify failed on 21057127661568 wanted 0xd1506ed9 found 0x7dd54bb9 level 0 [ 1398.218813] BTRFS: error (device sde) in __btrfs_free_extent:3054: errno=-5 IO failure [ 1398.218828] BTRFS: error (device sde) in btrfs_run_delayed_refs:2124: errno=-5 IO failure [ 1398.219002] BTRFS: error (device sde) in btrfs_replay_log:2254: errno=-5 IO failure (Failed to recover log tree) [ 1398.229048] BTRFS error (device sde): open_ctree failedThis is because we shouldn't allow to do anything write to the fs if we have anything wrong in extent tree.This is happening when mounting read-only. My assumption is that it only tries to replay in memory without writing anything to disk.We lacks the check on log tree. Normally for such forced RO mount, log replay is not allowed. We should output a warning to prompt user to use nologreplay, and reject the mount.I'm not familiar with log replay but couldn't there be something useful (ignoring ref counts) that would still be worth replaying in memory?Log replay means metadata write. Any write needs a valid extent tree to find out free space for new metadata/data. So no, we can't do anything but completely ignoring the log.
I see, updated patch. But even then it seems it could be possible to add new ramdisk and make allocations there (eg. create new extent tree there) thus allowing replay. I guess that's way too much work. Anyway thanks for feedback! Best regards, Dāvis