Thread (9 messages) 9 messages, 6 authors, 2022-03-07

Re: [PATCH] btrfs: unlock the newly allocated extent buffer in btrfs_alloc_tree_block()

From: Filipe Manana <hidden>
Date: 2021-09-20 10:17:36

On Mon, Sep 20, 2021 at 10:58 AM David Sterba [off-list ref] wrote:
On Tue, Sep 14, 2021 at 02:57:59PM +0800, Qu Wenruo wrote:
quoted
[BUG]
There is a very detailed bug report that injected ENOMEM error could
leave a tree block locked while we return to user-space:

  BTRFS info (device loop0): enabling ssd optimizations
  FAULT_INJECTION: forcing a failure.
  name failslab, interval 1, probability 0, space 0, times 0
  CPU: 0 PID: 7579 Comm: syz-executor Not tainted 5.15.0-rc1 #16
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
  rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
  Call Trace:
   __dump_stack lib/dump_stack.c:88 [inline]
   dump_stack_lvl+0x8d/0xcf lib/dump_stack.c:106
   fail_dump lib/fault-inject.c:52 [inline]
   should_fail+0x13c/0x160 lib/fault-inject.c:146
   should_failslab+0x5/0x10 mm/slab_common.c:1328
   slab_pre_alloc_hook.constprop.99+0x4e/0xc0 mm/slab.h:494
   slab_alloc_node mm/slub.c:3120 [inline]
   slab_alloc mm/slub.c:3214 [inline]
   kmem_cache_alloc+0x44/0x280 mm/slub.c:3219
   btrfs_alloc_delayed_extent_op fs/btrfs/delayed-ref.h:299 [inline]
   btrfs_alloc_tree_block+0x38c/0x670 fs/btrfs/extent-tree.c:4833
   __btrfs_cow_block+0x16f/0x7d0 fs/btrfs/ctree.c:415
   btrfs_cow_block+0x12a/0x300 fs/btrfs/ctree.c:570
   btrfs_search_slot+0x6b0/0xee0 fs/btrfs/ctree.c:1768
   btrfs_insert_empty_items+0x80/0xf0 fs/btrfs/ctree.c:3905
   btrfs_new_inode+0x311/0xa60 fs/btrfs/inode.c:6530
   btrfs_create+0x12b/0x270 fs/btrfs/inode.c:6783
   lookup_open+0x660/0x780 fs/namei.c:3282
   open_last_lookups fs/namei.c:3352 [inline]
   path_openat+0x465/0xe20 fs/namei.c:3557
   do_filp_open+0xe3/0x170 fs/namei.c:3588
   do_sys_openat2+0x357/0x4a0 fs/open.c:1200
   do_sys_open+0x87/0xd0 fs/open.c:1216
   do_syscall_x64 arch/x86/entry/common.c:50 [inline]
   do_syscall_64+0x34/0xb0 arch/x86/entry/common.c:80
   entry_SYSCALL_64_after_hwframe+0x44/0xae
  RIP: 0033:0x46ae99
  Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48
  89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d
  01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
  RSP: 002b:00007f46711b9c48 EFLAGS: 00000246 ORIG_RAX: 0000000000000055
  RAX: ffffffffffffffda RBX: 000000000078c0a0 RCX: 000000000046ae99
  RDX: 0000000000000000 RSI: 00000000000000a1 RDI: 0000000020005800
  RBP: 00007f46711b9c80 R08: 0000000000000000 R09: 0000000000000000
  R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000017
  R13: 0000000000000000 R14: 000000000078c0a0 R15: 00007ffc129da6e0

  ================================================
  WARNING: lock held when returning to user space!
  5.15.0-rc1 #16 Not tainted
  ------------------------------------------------
  syz-executor/7579 is leaving the kernel with locks still held!
  1 lock held by syz-executor/7579:
   #0: ffff888104b73da8 (btrfs-tree-01/1){+.+.}-{3:3}, at:
  __btrfs_tree_lock+0x2e/0x1a0 fs/btrfs/locking.c:112

[CAUSE]
In btrfs_alloc_tree_block(), after btrfs_init_new_buffer(), the new
extent buffer @buf is locked, but if later operations like adding
delayed tree ref fails, we just free @buf without unlocking it,
resulting above warning.

[FIX]
Unlock @buf in out_free_buf: tag.

Reported-by: Hao Sun <redacted>
Link: https://lore.kernel.org/linux-btrfs/CACkBjsZ9O6Zr0KK1yGn=1rQi6Crh1yeCRdTSBxx9R99L4xdn-Q@mail.gmail.com/ (local)
Signed-off-by: Qu Wenruo <redacted>
I found the following lockdep report, it's been with recent misc-next
and the functions on stack match what this patch touches but I haven't
done a deeper analysis so this could be a false trace (though the
warning seems legit).

The workload was some file creation/copy and relocation but I don't have
a more specific information.
The problem is known and is always triggered by brfs/187 for example.
It's unrelated to multi device filesystems or to Qu's patch.

The problem is that we have replace_path() locking nodes from a
relocation tree and then lock nodes from a subvolume tree,
while other paths do the opposite (like replace_file_extents() amongst
others), locking first from a subvolume tree and then from a
relocation tree.

I've reported that long ago on slack to Josef, which started to happen
shortly after the switch to rw semaphores for btree locks.
Unfortunately I don't think the problem is simple to fix.

[10898.966572] BTRFS info (device sdd10): balance: start -musage=50 -susage=50
[10898.980261] BTRFS info (device sdd10): relocating block group 195519578112 flags system
[10906.757623] BTRFS info (device sdd10): relocating block group 191148064768 flags metadata
[11401.635794]
[11401.637392] ======================================================
[11401.643647] WARNING: possible circular locking dependency detected
[11401.649956] 5.15.0-rc1-git+ #810 Not tainted
[11401.654315] ------------------------------------------------------
[11401.660588] btrfs/11698 is trying to acquire lock:
[11401.665467] ffff8bb384bf7068 (btrfs-treloc-03#2){+.+.}-{3:3}, at: __btrfs_tree_lock+0x2c/0x140 [btrfs]
[11401.675102]
[11401.675102] but task is already holding lock:
[11401.681029] ffff8bb3ce5a4110 (btrfs-tree-02/1){+.+.}-{3:3}, at: __btrfs_tree_lock+0x2c/0x140 [btrfs]
[11401.690417]
[11401.690417] which lock already depends on the new lock.
[11401.690417]
[11401.698700]
[11401.698700] the existing dependency chain (in reverse order) is:
[11401.706272]
[11401.706272] -> #2 (btrfs-tree-02/1){+.+.}-{3:3}:
[11401.712473]        __lock_acquire+0x3e5/0x730
[11401.716906]        lock_acquire.part.0+0x5f/0x190
[11401.721700]        down_write_nested+0x49/0x130
[11401.726310]        __btrfs_tree_lock+0x2c/0x140 [btrfs]
[11401.731744]        btrfs_init_new_buffer+0x7d/0x2c0 [btrfs]
[11401.737510]        btrfs_alloc_tree_block+0x13b/0x350 [btrfs]
[11401.743459]        __btrfs_cow_block+0x144/0x600 [btrfs]
[11401.748967]        btrfs_cow_block+0x107/0x160 [btrfs]
[11401.754306]        btrfs_search_slot+0x67a/0xc00 [btrfs]
[11401.759809]        btrfs_lookup_dir_item+0x7c/0xe0 [btrfs]
[11401.765507]        __btrfs_unlink_inode+0xaa/0x4f0 [btrfs]
[11401.771184]        btrfs_unlink+0x87/0x100 [btrfs]
[11401.776154]        vfs_unlink+0x101/0x210
[11401.780250]        do_unlinkat+0x19c/0x2c0
[11401.784435]        __x64_sys_unlinkat+0x34/0x60
[11401.789057]        do_syscall_64+0x3d/0xb0
[11401.793253]        entry_SYSCALL_64_after_hwframe+0x44/0xae
[11401.798914]
[11401.798914] -> #1 (btrfs-tree-02){++++}-{3:3}:
[11401.804947]        __lock_acquire+0x3e5/0x730
[11401.809381]        lock_acquire.part.0+0x5f/0x190
[11401.814185]        down_write_nested+0x49/0x130
[11401.818810]        __btrfs_tree_lock+0x2c/0x140 [btrfs]
[11401.824247]        btrfs_search_slot+0x2a1/0xc00 [btrfs]
[11401.829759]        do_relocation+0x12c/0x6f0 [btrfs]
[11401.834942]        relocate_tree_block+0x1a6/0x270 [btrfs]
[11401.840646]        relocate_tree_blocks+0xe8/0x260 [btrfs]
[11401.846358]        relocate_block_group+0x200/0x580 [btrfs]
[11401.852146]        btrfs_relocate_block_group+0x18b/0x350 [btrfs]
[11401.858455]        btrfs_relocate_chunk+0x38/0x120 [btrfs]
[11401.864166]        __btrfs_balance+0x2ea/0x490 [btrfs]
[11401.869511]        btrfs_balance+0x4d3/0x7c0 [btrfs]
[11401.874684]        btrfs_ioctl_balance+0x31c/0x3e0 [btrfs]
[11401.880382]        __x64_sys_ioctl+0x83/0xa0
[11401.884736]        do_syscall_64+0x3d/0xb0
[11401.888924]        entry_SYSCALL_64_after_hwframe+0x44/0xae
[11401.894573]
[11401.894573] -> #0 (btrfs-treloc-03#2){+.+.}-{3:3}:
[11401.900946]        check_prev_add+0x91/0xc30
[11401.905293]        validate_chain+0x56f/0x840
[11401.909740]        __lock_acquire+0x3e5/0x730
[11401.914186]        lock_acquire.part.0+0x5f/0x190
[11401.918979]        down_write_nested+0x49/0x130
[11401.923607]        __btrfs_tree_lock+0x2c/0x140 [btrfs]
[11401.929064]        btrfs_lock_root_node+0x31/0x40 [btrfs]
[11401.934671]        btrfs_search_slot+0x58e/0xc00 [btrfs]
[11401.940164]        replace_path+0x57e/0xc70 [btrfs]
[11401.945241]        merge_reloc_root+0x222/0x8c0 [btrfs]
[11401.950667]        merge_reloc_roots+0xf0/0x290 [btrfs]
[11401.956119]        relocate_block_group+0x2d7/0x580 [btrfs]
[11401.961910]        btrfs_relocate_block_group+0x18b/0x350 [btrfs]
[11401.968204]        btrfs_relocate_chunk+0x38/0x120 [btrfs]
[11401.973905]        __btrfs_balance+0x2ea/0x490 [btrfs]
[11401.979239]        btrfs_balance+0x4d3/0x7c0 [btrfs]
[11401.984423]        btrfs_ioctl_balance+0x31c/0x3e0 [btrfs]
[11401.990117]        __x64_sys_ioctl+0x83/0xa0
[11401.994456]        do_syscall_64+0x3d/0xb0
[11401.998642]        entry_SYSCALL_64_after_hwframe+0x44/0xae
[11402.004323]
[11402.004323] other info that might help us debug this:
[11402.004323]
[11402.012465] Chain exists of:
[11402.012465]   btrfs-treloc-03#2 --> btrfs-tree-02 --> btrfs-tree-02/1
[11402.012465]
[11402.023382]  Possible unsafe locking scenario:
[11402.023382]
[11402.029395]        CPU0                    CPU1
[11402.034004]        ----                    ----
[11402.038622]   lock(btrfs-tree-02/1);
[11402.042287]                                lock(btrfs-tree-02);
[11402.048288]                                lock(btrfs-tree-02/1);
[11402.054476]   lock(btrfs-treloc-03#2);
[11402.060166]
[11402.060166]  *** DEADLOCK ***
[11402.060166]
[11402.066212] 5 locks held by btrfs/11698:
[11402.070210]  #0: ffff8bb4fab09478 (sb_writers#9){.+.+}-{0:0}, at: btrfs_ioctl_balance+0x47/0x3e0 [btrfs]
[11402.079923]  #1: ffff8bb48520a370 (&fs_info->reclaim_bgs_lock){+.+.}-{3:3}, at: __btrfs_balance+0x179/0x490 [btrfs]
[11402.090584]  #2: ffff8bb4852088e0 (&fs_info->cleaner_mutex){+.+.}-{3:3}, at: btrfs_relocate_block_group+0x183/0x350 [btrfs]
[11402.101965]  #3: ffff8bb4fab09698 (sb_internal#2){.+.+}-{0:0}, at: merge_reloc_root+0x11c/0x8c0 [btrfs]
[11402.111616]  #4: ffff8bb3ce5a4110 (btrfs-tree-02/1){+.+.}-{3:3}, at: __btrfs_tree_lock+0x2c/0x140 [btrfs]
[11402.121418]
[11402.121418] stack backtrace:
[11402.125880] CPU: 7 PID: 11698 Comm: btrfs Not tainted 5.15.0-rc1-git+ #810
[11402.132843] Hardware name: empty empty/S3993, BIOS PAQEX0-3 02/24/2008
[11402.139470] Call Trace:
[11402.141989]  dump_stack_lvl+0x45/0x59
[11402.145736]  check_noncircular+0xf3/0x110
[11402.149827]  ? check_irq_usage+0xaa/0x3f0
[11402.153943]  check_prev_add+0x91/0xc30
[11402.157783]  validate_chain+0x56f/0x840
[11402.161719]  __lock_acquire+0x3e5/0x730
[11402.165654]  lock_acquire.part.0+0x5f/0x190
[11402.169924]  ? __btrfs_tree_lock+0x2c/0x140 [btrfs]
[11402.175013]  ? lock_acquire+0xa0/0x150
[11402.178839]  ? __btrfs_tree_lock+0x2c/0x140 [btrfs]
[11402.183949]  down_write_nested+0x49/0x130
[11402.188043]  ? __btrfs_tree_lock+0x2c/0x140 [btrfs]
[11402.193127]  __btrfs_tree_lock+0x2c/0x140 [btrfs]
[11402.198039]  btrfs_lock_root_node+0x31/0x40 [btrfs]
[11402.203148]  btrfs_search_slot+0x58e/0xc00 [btrfs]
[11402.208117]  replace_path+0x57e/0xc70 [btrfs]
[11402.212680]  merge_reloc_root+0x222/0x8c0 [btrfs]
[11402.217593]  ? lock_release+0x68/0x140
[11402.221427]  ? _raw_spin_unlock+0x1f/0x40
[11402.225522]  merge_reloc_roots+0xf0/0x290 [btrfs]
[11402.230439]  relocate_block_group+0x2d7/0x580 [btrfs]
[11402.235721]  btrfs_relocate_block_group+0x18b/0x350 [btrfs]
[11402.241521]  btrfs_relocate_chunk+0x38/0x120 [btrfs]
[11402.246688]  __btrfs_balance+0x2ea/0x490 [btrfs]
[11402.251524]  btrfs_balance+0x4d3/0x7c0 [btrfs]
[11402.256181]  btrfs_ioctl_balance+0x31c/0x3e0 [btrfs]
[11402.261377]  __x64_sys_ioctl+0x83/0xa0
[11402.265205]  do_syscall_64+0x3d/0xb0
[11402.268853]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[11402.273985] RIP: 0033:0x7fedef5ff6c7
[11402.277642] Code: 00 00 00 48 8b 05 c1 67 2b 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 91 67 2b 00 f7 d8 64 89 01 48
[11402.296539] RSP: 002b:00007ffda9289e78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[11402.304211] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fedef5ff6c7
[11402.311439] RDX: 00007ffda9289f10 RSI: 00000000c4009420 RDI: 0000000000000003
[11402.318657] RBP: 00007ffda928c866 R08: 00000000004c4cab R09: 0000000000000013
[11402.325877] R10: 0000000022494966 R11: 0000000000000246 R12: 0000000000000001
[11402.333115] R13: 00007ffda9289f10 R14: 0000000000000000 R15: 00007ffda9289f08
[11432.564100] BTRFS info (device sdd10): found 30943 extents, stage: move data extents


--
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help