Re: [PATCH V2] btrfs: traverse seed devices if fs_devices::devices is empty in show_devname
From: Anand Jain <hidden>
Date: 2021-08-18 06:48:56
On 18/08/2021 12:19, Su Yue wrote:
while running btrfs/238 in my test box, the following warning occurs in high chance: ------------[ cut here ]------------ WARNING: CPU: 3 PID: 481 at fs/btrfs/super.c:2509 btrfs_show_devname+0x104/0x1e8 [btrfs] CPU: 2 PID: 1 Comm: systemd Tainted: G W O 5.14.0-rc1-custom #72 Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015 Call trace: btrfs_show_devname+0x108/0x1b4 [btrfs] show_mountinfo+0x234/0x2c4 m_show+0x28/0x34 seq_read_iter+0x12c/0x3c4 vfs_read+0x29c/0x2c8 ksys_read+0x80/0xec __arm64_sys_read+0x28/0x34 invoke_syscall+0x50/0xf8 do_el0_svc+0x88/0x138 el0_svc+0x2c/0x8c el0t_64_sync_handler+0x84/0xe4 el0t_64_sync+0x198/0x19c ---[ end trace 3efd7e5950b8af05 ]---
It's also reproducible by creating a sprout filesystem and reading /proc/self/mounts in parallel.
ok. This explains.
The warning is produced if btrfs_show_devname() can't find any available device in fs_info->fs_devices->devices which is protected by RCU.
The warning is desirable to exercise there is at least one device in the mounted filesystem. However, it's not always true for a sprouting fs.
Right. When the code is running from line 2596 (including) until line
2607, there are chances that the fs_info->fs_devices->devices list is
empty. Or those devices are moving to fs_info->fs_devices->seed_list.
2596 ret = btrfs_prepare_sprout(fs_info);
2597 if (ret) {
2598 btrfs_abort_transaction(trans, ret);
2599 goto error_trans;
2600 }
2601 }
2602
2603 device->fs_devices = fs_devices;
2604
2605 mutex_lock(&fs_devices->device_list_mutex);
2606 mutex_lock(&fs_info->chunk_mutex);
2607 list_add_rcu(&device->dev_list, &fs_devices->devices);
quoted hunk ↗ jump to hunk
While a new device is being added into fs to be sprouted, call stack is: btrfs_ioctl_add_dev btrfs_init_new_device btrfs_prepare_sprout list_splice_init_rcu(&fs_devices->devices, &seed_devices->devices, synchronize_rcu); list_add_rcu(&device->dev_list, &fs_devices->devices); Looking at btrfs_prepare_sprout(), every new RCU reader will read a empty fs_devices->devices once synchronize_rcu() is called. After commit 4faf55b03823 ("btrfs: don't traverse into the seed devices in show_devname"), btrfs_show_devname() won't looking into fs_devices->seed_list even there is no device in fs_devices->devices. And since commit 88c14590cdd6 ("btrfs: use RCU in btrfs_show_devname for device list traversal"), btrfs_show_devname() only uses RCU without mutex lock taken for device list traversal. The function read an empty fs_devices->devices and found no device in the list then triggers the warning. The commit just enlarged the window that the fs device list could be empty. Even btrfs_show_devname() uses mutex_lock(), there is a tiny chance to read an empty devices list between mutex_unlock() in btrfs_prepare_sprout() and next mutex_lock() in btrfs_init_new_device(). So take device_list_mutex then traverse fs_devices->seed_list to seek for a seed device if no device was found in fs_devices->devices. Since a normal fs always has devices in fs_device->devices and the window is small enough, the mutex lock is not too heavy. Signed-off-by: Su Yue <redacted> --- Changelog: v2: Try to traverse fs_devices->seed_list instead of removing the WARN_ON(). Change the subject. Add description of fix. --- fs/btrfs/super.c | 41 ++++++++++++++++++++++++++++++++++++++--- 1 file changed, 38 insertions(+), 3 deletions(-)diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index d07b18b2b250..31e723eb2ccf 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c@@ -2482,7 +2482,9 @@ static int btrfs_unfreeze(struct super_block *sb) static int btrfs_show_devname(struct seq_file *m, struct dentry *root) { struct btrfs_fs_info *fs_info = btrfs_sb(root->d_sb); + struct btrfs_fs_devices *fs_devices = fs_info->fs_devices; struct btrfs_device *dev, *first_dev = NULL; + struct btrfs_fs_devices *seed_devices; /* * Lightweight locking of the devices. We should not need@@ -2492,7 +2494,7 @@ static int btrfs_show_devname(struct seq_file *m, struct dentry *root) * least until the rcu_read_unlock. */ rcu_read_lock(); - list_for_each_entry_rcu(dev, &fs_info->fs_devices->devices, dev_list) { + list_for_each_entry_rcu(dev, &fs_devices->devices, dev_list) { if (test_bit(BTRFS_DEV_STATE_MISSING, &dev->dev_state)) continue; if (!dev->name)@@ -2503,9 +2505,42 @@ static int btrfs_show_devname(struct seq_file *m, struct dentry *root) if (first_dev) seq_escape(m, rcu_str_deref(first_dev->name), " \t\n\\"); - else - WARN_ON(1); rcu_read_unlock(); + + if (first_dev) + return 0; + + /* + * While the fs is sprouting, above fs_devices->devices could be empty + * if the RCU read happened in the window between when + * fs_devices->devices was spliced into seed_devices->devices in + * btrfs_prepare_sprout() and new device is not added to + * fs_devices->devices in btrfs_init_new_device(). + * + * Take device_list_mutex to make sure seed_devices has been added into + * fs_devices->seed_list then we can traverse it. + */ + mutex_lock(&fs_devices->device_list_mutex);
possible fix:
As the problem is from line 2596 to 2607 (above) can we move
list_add_rcu(&device->dev_list, &fs_devices->devices);
into btrfs_prepare_sprout() so that it shall reduce the racing window.
And,
We have learned that taking device_list_mutex in this thread will end
up with a lockdep warning. We might need a new fs_info state to
indicate that FS is sprouting.
Thanks, Anand
+ list_for_each_entry(seed_devices, &fs_devices->seed_list, seed_list) {
+ list_for_each_entry(dev, &seed_devices->devices, dev_list) {
+ if (test_bit(BTRFS_DEV_STATE_MISSING, &dev->dev_state))
+ continue;
+ if (!dev->name)
+ continue;
+ if (!first_dev || dev->devid < first_dev->devid)
+ first_dev = dev;
+ }
+ }
+
+ if (first_dev) {
+ rcu_read_lock();
+ seq_escape(m, rcu_str_deref(first_dev->name), " \t\n\\");
+ rcu_read_unlock();
+ } else {
+ WARN_ON(1);
+ }
+ mutex_unlock(&fs_devices->device_list_mutex);
+
return 0;
}