Re: Bulk discard doesn't work after add/delete of devices

From: Liu Bo <hidden>
Date: 2012-02-09 08:42:00
Subsystem: btrfs file system, filesystems (vfs and infrastructure), the rest · Maintainers: Chris Mason, David Sterba, Alexander Viro, Christian Brauner, Linus Torvalds

On 02/06/2012 04:37 AM, Lutz Euler wrote:

... maybe even the block group cache is nonfunctional then?

I am using a btrfs file system, mirrored data and metadata on two SSDs,
and recently tried for the first time to run fstrim on it. fstrim
unexpectedly said "0 bytes were trimmed". strace -T shows that it spends
only a few microseconds in the ioctl system call (basically the overhead
of strace, it seems), so I engaged in some "printk" debugging and found
that after "btrfs_trim_fs" executes its first statement,

  cache = btrfs_lookup_block_group(fs_info, range->start);

cache is 0. As the file system was created with a very recent kernel and
always mounted with the default "space_cache" option I guessed that this
might have something to do with the fact that I exchanged both the
filesystem's devices earlier (as you can see from the devid's in the
following output -- this is the only btrfs file system on the machine):

# btrfs fi show /dev/sda3
Label: none  uuid: 88af7576-3027-4a3b-a5ae-34bfd167982f
	Total devices 2 FS bytes used 28.13GB
	devid    4 size 74.53GB used 38.06GB path /dev/sdb1
	devid    3 size 75.24GB used 38.06GB path /dev/sda3

("exchanged" means: I created the filesystem as a mirror of sdb1 and
sdb2, put data on it, then added sda3, balanced, deleted sdb2, balanced
again, then unmounted, repartitioned sdb1 and sdb2 together as a larger
sdb1, mounted degraded, added sdb1, balanced and deleted "missing".)

So I tried to reproduce this and got the following:

# dd if=/dev/zero of=img0 bs=1 count=0 seek=5G
# dd if=/dev/zero of=img1 bs=1 count=0 seek=5G
# losetup -f img0
# losetup -f img1
# mkfs.btrfs -d raid1 -m raid1 /dev/loop0 /dev/loop1
# mount /dev/loop0 /mnt
# fstrim -v /mnt
/mnt: 4332265472 bytes were trimmed

(The above mentioned kernel instrumentation shows that "cache" was
not 0 here.)

# dd if=/dev/zero of=img2 bs=1 count=0 seek=5G
# losetup -f img2
# btrfs device add /dev/loop2 /mnt
# btrfs device delete /dev/loop0 /mnt
# fstrim -v /mnt
/mnt: 0 bytes were trimmed

("cache" was 0 here.)

Software versions: See my earlier mail about degraded mirrors;
reproducing the issue was on v3.3-rc2-172-g23783f8. I built fstrim
from git, that is, util-linux pulled yesterday from git.kernel.org.

What's going on here?
If the space cache is indeed broken on this filesystem, can I repair
it without risk to my data?
By mounting with "nospace_cache" once or somehow using "clear_cache"?

Hi Lutz,

Would you please test the following patch on your box?

thanks,
liubo

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 77ea23c..b6e2c92 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c

@@ -7653,9 +7653,16 @@ int btrfs_trim_fs(struct btrfs_root *root, struct fstrim_range *range)
 	u64 start;
 	u64 end;
 	u64 trimmed = 0;
+	u64 total_bytes = btrfs_super_total_bytes(fs_info->super_copy);
 	int ret = 0;
 
-	cache = btrfs_lookup_block_group(fs_info, range->start);
+	/*
+	 * try to trim all FS space, our block group may start from non-zero.
+	 */
+	if (range->len == total_bytes)
+		cache = btrfs_lookup_first_block_group(fs_info, range->start);
+	else
+		cache = btrfs_lookup_block_group(fs_info, range->start);
 
 	while (cache) {
 		if (cache->key.objectid >= (range->start + range->len)) {

-- 
1.6.5.2


> Greetings,
> 
> Lutz
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help