Re: [PATCH v2 3/7] btrfs: do not read super look for a device path
From: Josef Bacik <josef@toxicpanda.com>
Date: 2021-09-27 15:32:39
On 8/24/21 10:00 PM, Anand Jain wrote:
On 28/07/2021 05:01, Josef Bacik wrote:quoted
For device removal and replace we call btrfs_find_device_by_devspec, which if we give it a device path and nothing else will call btrfs_find_device_by_path, which opens the block device and reads the super block and then looks up our device based on that. However this is completely unnecessary because we have the path stored in our device on our fsdevices. All we need to do if we're given a path is look through the fs_devices on our file system and use that device if we find it, reading the super block is just silly.The device path as stored in our fs_devices can differ from the path provided by the user for the same device (for example, dm, lvm). btrfs-progs sanitize the device path but, others (for example, an ioctl test case) might not. And the path lookup would fail. Also, btrfs dev scan <path> can update the device path anytime, even after it is mounted. Fixing that failed the subsequent subvolume mounts (if I remember correctly).
This is a good point, that's kind of a big deal from a UX perspective.
quoted
This fixes the case where we end up with our sb write "lock" getting the dependency of the block device ->open_mutex, which resulted in the following lockdep splatCan we do.. btrfs_exclop_start() :: find device part (read sb) :: mnt_want_write_file()?
I looked into this, but we'd have to re-order the exclop_start to above the mnt_want_write_file() part everywhere to be consistent, and this is mostly OK except for balance. Balance the exclop is tied to the lifetime of the balance ctl, which can exist past the task running balance because we could pause the balance. Could we get around this? Sure, but in my head exclop == lock. This means we have something akin to exclop_start mnt_want_write_file() pause balance mnt_drop_write() resume balance exclop_start magic stuff in balance to resume without doing the exclop mnt_want_write_file() <do balance> exclop_finish mnt_drop_write() If we're OK with this then we can definitely do that. The other option is simply to make userspace do the superblock read and use the devid thing for us. Then we just eat the UX problem for older tools where you want to do btrfs rm device /dev/mapper/whatever and we have the pathname as /dev/dm-#. Both options are unattractive in their own way. I think the first option is only annoying to us, and maintains the UX expectations. But I want more than me to make this decision, so if you and Dave are OK with that I'll go with re-ordering exclop+mnt_want_write_file(), and then put the device lookup between the two of them for device removal. Thanks, Josef