Re: Combining nodatasum + compression
From: Qu Wenruo <hidden>
Date: 2021-06-11 00:18:41
On 2021/6/10 下午10:32, Martin Raiber wrote:
quoted hunk ↗ jump to hunk
Hi, when btrfs is running on a block device that improves integrity (e.g. Ceph), it's usefull to run it with nodatasum to reduce the amount of metadata and random IO. In that case it would also be useful to be able to run it with compression combined with nodatasum as well. I actually found that if nodatasum is specified after compress-force, that it doesn't remove reset the compress/nodatasum bit, while the other way around it doesn't work. That combined with--- linux-5.10.39/fs/btrfs/inode.c.orig 2021-05-31 16:11:03.537017580 +0200 +++ linux-5.10.39/fs/btrfs/inode.c 2021-05-31 16:11:19.461591523 +0200@@ -408,8 +408,7 @@ */ static inline bool inode_can_compress(struct btrfs_inode *inode) { - if (inode->flags & BTRFS_INODE_NODATACOW || - inode->flags & BTRFS_INODE_NODATASUM) + if (inode->flags & BTRFS_INODE_NODATACOW) return false; return true; }should do the trick. > The above code was added with the argument that having no checksums with compression would damage too much data in case of corruption ( https://lore.kernel.org/linux-btrfs/20180515073622.18732-2-wqu@suse.com/ (local) ).
It doesn't make a difference whether it's a single device fs or not. If we don't have csum, the corruption is not only affecting the sector where the corruption is, but the full compressed extent. Furthermore, it's not that simple. Current code we always expect compressed read to find some csum. Just check btrfs_submit_compressed_read(), it will call btrfs_lookup_bio_sums(). Which will fill the csum array with 0 if it can not find any csum. Then at endio callbacks, we verify the csum against the data we read, if it's all zero, the csum will definitely mismatch and discard the data no matter if it's correct or not. The same thing applies to btrfs_submit_compressed_write(), it will always generate csum. The diff will just give you a false sense of compression without csum. It will still generate csum for write and relies on csum check at read time. Thanks, Qu
This argument doesn't apply much to single device file systems and even less to file systems on Ceph like volumes.