Re: [PATCH V2] btrfs: traverse seed devices if fs_devices::devices is empty in show_devname
From: Su Yue <hidden>
Date: 2021-08-19 06:27:43
On Wed 18 Aug 2021 at 14:48, Anand Jain [off-list ref] wrote:
On 18/08/2021 12:19, Su Yue wrote:quoted
while running btrfs/238 in my test box, the following warning occurs in high chance: ------------[ cut here ]------------ WARNING: CPU: 3 PID: 481 at fs/btrfs/super.c:2509 btrfs_show_devname+0x104/0x1e8 [btrfs] CPU: 2 PID: 1 Comm: systemd Tainted: G W O 5.14.0-rc1-custom #72 Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015 Call trace: btrfs_show_devname+0x108/0x1b4 [btrfs] show_mountinfo+0x234/0x2c4 m_show+0x28/0x34 seq_read_iter+0x12c/0x3c4 vfs_read+0x29c/0x2c8 ksys_read+0x80/0xec __arm64_sys_read+0x28/0x34 invoke_syscall+0x50/0xf8 do_el0_svc+0x88/0x138 el0_svc+0x2c/0x8c el0t_64_sync_handler+0x84/0xe4 el0t_64_sync+0x198/0x19c ---[ end trace 3efd7e5950b8af05 ]---quoted
It's also reproducible by creating a sprout filesystem and reading /proc/self/mounts in parallel.ok. This explains.quoted
The warning is produced if btrfs_show_devname() can't find any available device in fs_info->fs_devices->devices which is protected by RCU.quoted
The warning is desirable to exercise there is at least one device in the mounted filesystem. However, it's not always true for a sprouting fs.Right. When the code is running from line 2596 (including) until line 2607, there are chances that the fs_info->fs_devices->devices list is empty. Or those devices are moving to fs_info->fs_devices->seed_list. 2596 ret = btrfs_prepare_sprout(fs_info); 2597 if (ret) { 2598 btrfs_abort_transaction(trans, ret); 2599 goto error_trans; 2600 } 2601 } 2602 2603 device->fs_devices = fs_devices; 2604 2605 mutex_lock(&fs_devices->device_list_mutex); 2606 mutex_lock(&fs_info->chunk_mutex); 2607 list_add_rcu(&device->dev_list, &fs_devices->devices);quoted
While a new device is being added into fs to be sprouted, call stack is: btrfs_ioctl_add_dev btrfs_init_new_device btrfs_prepare_sprout list_splice_init_rcu(&fs_devices->devices, &seed_devices->devices, synchronize_rcu); list_add_rcu(&device->dev_list, &fs_devices->devices); Looking at btrfs_prepare_sprout(), every new RCU reader will read a empty fs_devices->devices once synchronize_rcu() is called. After commit 4faf55b03823 ("btrfs: don't traverse into the seed devices in show_devname"), btrfs_show_devname() won't looking into fs_devices->seed_list even there is no device in fs_devices->devices. And since commit 88c14590cdd6 ("btrfs: use RCU in btrfs_show_devname for device list traversal"), btrfs_show_devname() only uses RCU without mutex lock taken for device list traversal. The function read an empty fs_devices->devices and found no device in the list then triggers the warning. The commit just enlarged the window that the fs device list could be empty. Even btrfs_show_devname() uses mutex_lock(), there is a tiny chance to read an empty devices list between mutex_unlock() in btrfs_prepare_sprout() and next mutex_lock() in btrfs_init_new_device(). So take device_list_mutex then traverse fs_devices->seed_list to seek for a seed device if no device was found in fs_devices->devices. Since a normal fs always has devices in fs_device->devices and the window is small enough, the mutex lock is not too heavy. Signed-off-by: Su Yue <redacted> --- Changelog: v2: Try to traverse fs_devices->seed_list instead of removing the WARN_ON(). Change the subject. Add description of fix. --- fs/btrfs/super.c | 41 ++++++++++++++++++++++++++++++++++++++--- 1 file changed, 38 insertions(+), 3 deletions(-)diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index d07b18b2b250..31e723eb2ccf 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c@@ -2482,7 +2482,9 @@ static int btrfs_unfreeze(structsuper_block *sb) static int btrfs_show_devname(struct seq_file *m, struct dentry *root) { struct btrfs_fs_info *fs_info = btrfs_sb(root->d_sb); + struct btrfs_fs_devices *fs_devices = fs_info->fs_devices; struct btrfs_device *dev, *first_dev = NULL; + struct btrfs_fs_devices *seed_devices; /* * Lightweight locking of the devices. We should not need@@ -2492,7 +2494,7 @@ static int btrfs_show_devname(structseq_file *m, struct dentry *root) * least until the rcu_read_unlock. */ rcu_read_lock(); - list_for_each_entry_rcu(dev, &fs_info->fs_devices->devices, dev_list) { + list_for_each_entry_rcu(dev, &fs_devices->devices, dev_list) { if (test_bit(BTRFS_DEV_STATE_MISSING, &dev->dev_state)) continue; if (!dev->name)@@ -2503,9 +2505,42 @@ static int btrfs_show_devname(structseq_file *m, struct dentry *root) if (first_dev) seq_escape(m, rcu_str_deref(first_dev->name), " \t\n\\"); - else - WARN_ON(1); rcu_read_unlock(); + + if (first_dev) + return 0; + + /* + * While the fs is sprouting, above fs_devices->devices could be empty + * if the RCU read happened in the window between when + * fs_devices->devices was spliced into seed_devices->devices in + * btrfs_prepare_sprout() and new device is not added to + * fs_devices->devices in btrfs_init_new_device(). + * + * Take device_list_mutex to make sure seed_devices has been added into + * fs_devices->seed_list then we can traverse it. + */ + mutex_lock(&fs_devices->device_list_mutex);possible fix: As the problem is from line 2596 to 2607 (above) can we move list_add_rcu(&device->dev_list, &fs_devices->devices); into btrfs_prepare_sprout() so that it shall reduce the racing window.
Oh..no. It should not work after doing some code pastes. btrfs_show_devname(), the RCU reader doesn't take the mutex lock. So there is always a short peroid between the list_add_rcu() and the list_splice_init_rcu() updaters. Or is there another adavnaced RCU API? Thanks. -- Su
And, We have learned that taking device_list_mutex in this thread will end up with a lockdep warning. We might need a new fs_info state to indicate that FS is sprouting. Thanks, Anandquoted
+ list_for_each_entry(seed_devices, &fs_devices->seed_list, seed_list) { + list_for_each_entry(dev, &seed_devices->devices, dev_list) { + if (test_bit(BTRFS_DEV_STATE_MISSING, &dev->dev_state)) + continue; + if (!dev->name) + continue; + if (!first_dev || dev->devid < first_dev->devid) + first_dev = dev; + } + } + + if (first_dev) { + rcu_read_lock(); + seq_escape(m, rcu_str_deref(first_dev->name), " \t\n\\"); + rcu_read_unlock(); + } else { + WARN_ON(1); + } + mutex_unlock(&fs_devices->device_list_mutex); + return 0; }