Thread (5 messages) 5 messages, 3 authors, 2021-08-19

Re: [PATCH V2] btrfs: traverse seed devices if fs_devices::devices is empty in show_devname

From: Su Yue <hidden>
Date: 2021-08-19 06:27:43

On Wed 18 Aug 2021 at 14:48, Anand Jain [off-list ref] 
wrote:
On 18/08/2021 12:19, Su Yue wrote:
quoted
while running btrfs/238 in my test box, the following warning 
occurs
in high chance:

------------[ cut here  ]------------
WARNING: CPU: 3 PID: 481 at fs/btrfs/super.c:2509 
btrfs_show_devname+0x104/0x1e8 [btrfs]
CPU: 2 PID: 1 Comm: systemd Tainted: G        W  O 
5.14.0-rc1-custom #72
Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
Call trace:
   btrfs_show_devname+0x108/0x1b4 [btrfs]
   show_mountinfo+0x234/0x2c4
   m_show+0x28/0x34
   seq_read_iter+0x12c/0x3c4
   vfs_read+0x29c/0x2c8
   ksys_read+0x80/0xec
   __arm64_sys_read+0x28/0x34
   invoke_syscall+0x50/0xf8
   do_el0_svc+0x88/0x138
   el0_svc+0x2c/0x8c
   el0t_64_sync_handler+0x84/0xe4
   el0t_64_sync+0x198/0x19c
---[ end trace 3efd7e5950b8af05  ]---
quoted
It's also reproducible by creating a sprout filesystem and 
reading
/proc/self/mounts in parallel.
  ok. This explains.
quoted
The warning is produced if btrfs_show_devname() can't find any 
available
device in fs_info->fs_devices->devices which is protected by 
RCU.
quoted
The warning is desirable to exercise there is at least one 
device in the
mounted filesystem. However, it's not always true for a 
sprouting fs.

Right. When the code is running from line 2596 (including) until 
line
2607, there are chances that the fs_info->fs_devices->devices 
list is
empty. Or those devices are moving to 
fs_info->fs_devices->seed_list.


2596                 ret = btrfs_prepare_sprout(fs_info);
2597                 if (ret) {
2598                         btrfs_abort_transaction(trans, 
ret);
2599                         goto error_trans;
2600                 }
2601         }
2602
2603         device->fs_devices = fs_devices;
2604
2605         mutex_lock(&fs_devices->device_list_mutex);
2606         mutex_lock(&fs_info->chunk_mutex);
2607         list_add_rcu(&device->dev_list, 
&fs_devices->devices);

quoted
While a new device is being added into fs to be sprouted, call 
stack is:
  btrfs_ioctl_add_dev
   btrfs_init_new_device
     btrfs_prepare_sprout
       list_splice_init_rcu(&fs_devices->devices, 
       &seed_devices->devices,
       synchronize_rcu);
     list_add_rcu(&device->dev_list, &fs_devices->devices);

Looking at btrfs_prepare_sprout(), every new RCU reader will 
read a
empty fs_devices->devices once synchronize_rcu() is called.
After commit 4faf55b03823 ("btrfs: don't traverse into the seed 
devices
in show_devname"), btrfs_show_devname() won't looking into
fs_devices->seed_list even there is no device in 
fs_devices->devices.

And since commit 88c14590cdd6 ("btrfs: use RCU in 
btrfs_show_devname for
device list traversal"), btrfs_show_devname() only uses RCU 
without mutex
lock taken for device list traversal. The function read an 
empty
fs_devices->devices and found no device in the list then 
triggers the
warning. The commit just enlarged the window that the fs device 
list
could be empty. Even btrfs_show_devname() uses mutex_lock(), 
there is a
tiny chance to read an empty devices list between 
mutex_unlock() in
btrfs_prepare_sprout() and next mutex_lock() in 
btrfs_init_new_device().

So take device_list_mutex then traverse fs_devices->seed_list 
to seek
for a seed device if no device was found in 
fs_devices->devices.
Since a normal fs always has devices in fs_device->devices and 
the
window is small enough, the mutex lock is not too heavy.

Signed-off-by: Su Yue <redacted>

---
Changelog:
v2:
     Try to traverse fs_devices->seed_list instead of removing 
     the
       WARN_ON().
     Change the subject.
     Add description of fix.
---
  fs/btrfs/super.c | 41 
  ++++++++++++++++++++++++++++++++++++++---
  1 file changed, 38 insertions(+), 3 deletions(-)
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index d07b18b2b250..31e723eb2ccf 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -2482,7 +2482,9 @@ static int btrfs_unfreeze(struct 
super_block *sb)
  static int btrfs_show_devname(struct seq_file *m, struct 
  dentry *root)
  {
  	struct btrfs_fs_info *fs_info = btrfs_sb(root->d_sb);
+	struct btrfs_fs_devices *fs_devices = fs_info->fs_devices;
  	struct btrfs_device *dev, *first_dev = NULL;
+	struct btrfs_fs_devices *seed_devices;

  	/*
  	 * Lightweight locking of the devices. We should not need
@@ -2492,7 +2494,7 @@ static int btrfs_show_devname(struct 
seq_file *m, struct dentry *root)
  	 * least until the rcu_read_unlock.
  	 */
  	rcu_read_lock();
-	list_for_each_entry_rcu(dev, 
&fs_info->fs_devices->devices, dev_list) {
+	list_for_each_entry_rcu(dev, &fs_devices->devices, 
dev_list) {
  		if (test_bit(BTRFS_DEV_STATE_MISSING, 
  &dev->dev_state))
  			continue;
  		if (!dev->name)
@@ -2503,9 +2505,42 @@ static int btrfs_show_devname(struct 
seq_file *m, struct dentry *root)

  	if (first_dev)
  		seq_escape(m, rcu_str_deref(first_dev->name), " 
  \t\n\\");
-	else
-		WARN_ON(1);
  	rcu_read_unlock();
+
+	if (first_dev)
+		return 0;
+
+	/*
+	 * While the fs is sprouting, above fs_devices->devices 
could be empty
+	 * if the RCU read happened in the window between when
+	 * fs_devices->devices was spliced into 
seed_devices->devices in
+	 * btrfs_prepare_sprout() and new device is not added to
+	 * fs_devices->devices in btrfs_init_new_device().
+	 *
+	 * Take device_list_mutex to make sure seed_devices has 
been added into
+	 * fs_devices->seed_list then we can traverse it.
+	 */
+	mutex_lock(&fs_devices->device_list_mutex);

possible fix:
  As the problem is from line 2596 to 2607 (above) can we move
     list_add_rcu(&device->dev_list, &fs_devices->devices);
  into btrfs_prepare_sprout() so that it shall reduce the racing 
  window.
Oh..no. It should not work after doing some code pastes.
btrfs_show_devname(), the RCU reader doesn't take the mutex lock.
So there is always a short peroid between the list_add_rcu() and 
the
list_splice_init_rcu() updaters. Or is there another adavnaced RCU 
API?

Thanks.

--
Su
And,
  We have learned that taking device_list_mutex in this thread 
  will end
  up with a lockdep warning. We might need a new fs_info state 
  to
  indicate that FS is sprouting.

Thanks, Anand
quoted
+	list_for_each_entry(seed_devices, &fs_devices->seed_list, 
seed_list) {
+		list_for_each_entry(dev, &seed_devices->devices, 
dev_list) {
+			if (test_bit(BTRFS_DEV_STATE_MISSING, 
&dev->dev_state))
+				continue;
+			if (!dev->name)
+				continue;
+			if (!first_dev || dev->devid < first_dev->devid)
+				first_dev = dev;
+		}
+	}
+
+	if (first_dev) {
+		rcu_read_lock();
+		seq_escape(m, rcu_str_deref(first_dev->name), " 
\t\n\\");
+		rcu_read_unlock();
+	} else {
+		WARN_ON(1);
+	}
+	mutex_unlock(&fs_devices->device_list_mutex);
+
  	return 0;
  }
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help