Re: Combining nodatasum + compression
From: Qu Wenruo <hidden>
Date: 2021-06-11 11:15:55
On 2021/6/11 下午6:55, Martin Raiber wrote:
On 11.06.2021 02:18 Qu Wenruo wrote:quoted
On 2021/6/10 下午10:32, Martin Raiber wrote:quoted
when btrfs is running on a block device that improves integrity (e.g. Ceph), it's usefull to run it with nodatasum to reduce the amount of metadata and random IO. In that case it would also be useful to be able to run it with compression combined with nodatasum as well. I actually found that if nodatasum is specified after compress-force, that it doesn't remove reset the compress/nodatasum bit, while the other way around it doesn't work. That combined with--- linux-5.10.39/fs/btrfs/inode.c.orig 2021-05-31 16:11:03.537017580 +0200 +++ linux-5.10.39/fs/btrfs/inode.c 2021-05-31 16:11:19.461591523 +0200@@ -408,8 +408,7 @@*/ static inline bool inode_can_compress(struct btrfs_inode *inode) { - if (inode->flags & BTRFS_INODE_NODATACOW || - inode->flags & BTRFS_INODE_NODATASUM) + if (inode->flags & BTRFS_INODE_NODATACOW) return false; return true; } should do the trick. > The above code was added with the argument that having no checksums with compression would damage too much data in case of corruption ( https://lore.kernel.org/linux-btrfs/20180515073622.18732-2-wqu@suse.com/ (local) ).It doesn't make a difference whether it's a single device fs or not. If we don't have csum, the corruption is not only affecting the sector where the corruption is, but the full compressed extent.Doesn't really matter if it doesn't happen if btrfs is on a e.g. ceph volume.
But this still means, all other regular end user can hit a case where they set nodatasum for a directory and later corrupt their data. This will increase the corruption loss, if user just migrate from older kernel to newer one. Yeah, you can blame the users for doing that, but I'm still not that sure if such behavior change is fine. Especially when this increase the chance of data loss. It may be fine for ceph, but btrfs still have a wider user base to consider.
Furthermore it depends on the use-case if corruption affecting one page, or a whole compressed block is something one can live with.
That's true, but my point is, at least we shouldn't increase the data loss possibility for the end users. If in the past, we allowed NODATASUM + compression, then converting to current situation, I'm more or less fine, since it reduce the possibility of data loss. But not in the other direction.
quoted
Furthermore, it's not that simple. Current code we always expect compressed read to find some csum. Just check btrfs_submit_compressed_read(), it will call btrfs_lookup_bio_sums(). Which will fill the csum array with 0 if it can not find any csum.It seems to make that conditional w.r.t. BTRFS_INODE_NODATASUM if (!(BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM)) { ret = btrfs_lookup_bio_sums(inode, comp_bio, (u64)-1, sums); BUG_ON(ret); /* -ENOMEM */ }and in check_compressed_csum: if (inode->flags & BTRFS_INODE_NODATASUM) return 0; This seems to have been moved around recently ( https://github.com/torvalds/linux/commit/334c16d82cfe180f7b262a6f8ae2d9379f032b18 ).
Forgot that patch, I should look one function deeper to confirm the behavior. That indeed reduces my concerns from the code point of view. Thanks, Qu
quoted
Then at endio callbacks, we verify the csum against the data we read, if it's all zero, the csum will definitely mismatch and discard the data no matter if it's correct or not. The same thing applies to btrfs_submit_compressed_write(), it will always generate csum.Same here: int skip_sum = inode->flags & BTRFS_INODE_NODATASUM; ... if (!skip_sum) { ret = btrfs_csum_one_bio(inode, bio, start, 1); BUG_ON(ret); /* -ENOMEM */ }quoted
The diff will just give you a false sense of compression without csum. It will still generate csum for write and relies on csum check at read time. >>