Re: [PATCH] btrfs: free device if we fail to open it

From: Anand Jain <hidden>
Date: 2021-12-02 07:09:54

On 02/12/2021 05:18, Josef Bacik wrote:

We've been having transient failures of btrfs/197 because sometimes we
don't show a missing device.

This turned out to be because I use LVM for my devices, and sometimes we
race with udev and get the "/dev/dm-#" device registered as a device in
the fs_devices list instead of "/dev/mapper/vg0-lv#" device.
 Thus when
the test adds a device to the existing mount it doesn't find the old
device in the original fs_devices list and remove it properly.

  The above para is confusing. It can go. IMHO. The device path shouldn't
  matter as we depend on the bdev to compare in the device add thread.

2637         bdev = blkdev_get_by_path(device_path, FMODE_WRITE | 
FMODE_EXCL,
2638                                   fs_info->bdev_holder);
::
2657         list_for_each_entry_rcu(device, &fs_devices->devices, 
dev_list) {
2658                 if (device->bdev == bdev) {
2659                         ret = -EEXIST;
2660                         rcu_read_unlock();
2661                         goto error;
2662                 }
2663         }

This is fine in general, because when we open the devices we check the
UUID, and properly skip using the device that we added to the other file
system.  However we do not remove it from our fs_devices,

Hmm, context/thread is missing. Like, is it during device add or during 
mkfs/dev-scan?

AFAIK btrfs_free_stale_devices() checks and handles the removing of 
stale devices in the fs_devices in both the contexts/threads device-add, 
mkfs (device-scan).

  For example:

$ alias cnt_free_stale_devices="bpftrace -e 
'kprobe:btrfs_free_stale_devices { @ = count(); }'"

New FSID on 2 devices, we call free_stale_devices():

$ cnt_free_stale_devices -c 'mkfs.btrfs -fq -draid1 -mraid1 
/dev/vg/scratch0 /dev/vg/scratch1'
Attaching 1 probe...

@: 2

  We do it only when there is a new fsid/device added to the fs_devices.

For example:

Clean up the fs_devices:
$ cnt_free_stale_devices -c 'btrfs dev scan --forget'
Attaching 1 probe...

@: 1

Now mounting devices are new to the fs_devices list, so call 
free_stale_devices().

$ cnt_free_stale_devices -c 'mount -o device=/dev/vg/scratch0 
/dev/vg/scratch1 /btrfs'
Attaching 1 probe...

@: 2

$ umount /btrfs

Below we didn't call free_stale_devices() because these two devices are 
already in the appropriate fs_devices.

$ cnt_free_stale_devices -c 'mount -o device=/dev/vg/scratch0 
/dev/vg/scratch1 /btrfs'
Attaching 1 probe...

@: 0

$

To me, it looks to be working correctly.

so when we get
the device info we still see this old device and think that everything
is ok.

We have a check for -ENODATA coming back from reading the super off of
the device to catch the wipefs case, but we don't catch literally any
other error where the device is no longer valid and thus do not remove
the device.

Fix this to not special case an empty device when reading the super
block, and simply remove any device we are unable to open.

With this fix we properly print out missing devices in the test case,
and after 500 iterations I'm no longer able to reproduce the problem,
whereas I could usually reproduce within 200 iterations.

commit 7f551d969037 ("btrfs: free alien device after device add")
fixed the case we weren't freeing the stale device in the device-add 
context.

However, here I don't understand the thread/context we are missing to 
free the stale device here.

quoted hunk ↗ jump to hunk

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
  fs/btrfs/disk-io.c | 2 +-
  fs/btrfs/volumes.c | 2 +-
  2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 5c598e124c25..fa34b8807f8d 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c

@@ -3924,7 +3924,7 @@ struct btrfs_super_block *btrfs_read_dev_one_super(struct block_device *bdev,
  	super = page_address(page);
  	if (btrfs_super_magic(super) != BTRFS_MAGIC) {
  		btrfs_release_disk_super(super);
-		return ERR_PTR(-ENODATA);
+		return ERR_PTR(-EINVAL);
  	}

  I think you are removing ENODATA because it is going away in the 
parent function. And, we don't reach this condition in the test case 
btrfs/197.
  Instead, we fail here at line 645 below and, we return -EINVAL;

  645         if (memcmp(device->uuid, disk_super->dev_item.uuid, 
BTRFS_UUID_SIZE))
  646                 goto error_free_page;
  647

  687 error_free_page:
  688         btrfs_release_disk_super(disk_super);
  689         blkdev_put(bdev, flags);
  690
  691         return -EINVAL;

function stack carries the return code to the parent open_fs_devices().

open_fs_devices()
  btrfs_open_one_device()
   btrfs_get_bdev_and_sb()
    btrfs_read_dev_super()
     btrfs_read_dev_one_super()

quoted hunk ↗ jump to hunk

  	if (btrfs_super_bytenr(super) != bytenr_orig) {

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index f38c230111be..890153dd2a2b 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c

@@ -1223,7 +1223,7 @@ static int open_fs_devices(struct btrfs_fs_devices *fs_devices,
  		if (ret == 0 &&
  		    (!latest_dev || device->generation > latest_dev->generation)) {
  			latest_dev = device;
-		} else if (ret == -ENODATA) {
+		} else if (ret) {
  			fs_devices->num_devices--;
  			list_del(&device->dev_list);
  			btrfs_free_device(device);

There are many other cases, for example including -EACCES (from 
blkdev_get_by_path()) (I haven't checked other error codes). For which, 
calling btrfs_free_device() is inappropriate.

To be safer, how about just checking for error codes explicitly for 
-EINVAL? as we could probably free the device when it is -EINVAL.

Thanks, Anand

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help