Thread (22 messages) 22 messages, 5 authors, 2022-06-21

Re: bisected: btrfs dedupe regression in v5.11-rc1: 3078d85c9a10 vfs: verify source area in vfs_dedupe_file_range_one()

From: Nikolay Borisov <hidden>
Date: 2021-12-14 11:11:27


On 14.12.21 г. 1:12, Zygo Blaxell wrote:
On Mon, Dec 13, 2021 at 03:28:26PM +0200, Nikolay Borisov wrote:
quoted
On 10.12.21 г. 20:34, Zygo Blaxell wrote:
quoted
I've been getting deadlocks in dedupe on btrfs since kernel 5.11, and
some bees users have reported it as well.  I bisected to this commit:

	3078d85c9a10 vfs: verify source area in vfs_dedupe_file_range_one()

These kernels work for at least 18 hours:

	5.10.83 (months)
	5.11.22 with 3078d85c9a10 reverted (36 hours)
	btrfs misc-next 66dc4de326b0 with 3078d85c9a10 reverted

These kernels lock up in 3 hours or less:

	5.11.22
	5.12.19
	5.14.21
	5.15.6
	btrfs for-next 279373dee83e

All of the failing kernels include this commit, none of the non-failing
kernels include the commit.

Kernel logs from the lockup:

	[19647.696042][ T3721] sysrq: Show Blocked State
	[19647.697024][ T3721] task:btrfs-transacti state:D stack:    0 pid: 6161 ppid:     2 flags:0x00004000
	[19647.698203][ T3721] Call Trace:
	[19647.698608][ T3721]  __schedule+0x388/0xaf0
	[19647.699125][ T3721]  schedule+0x68/0xe0
	[19647.699615][ T3721]  btrfs_commit_transaction+0x97c/0xbf0
Can you run this through symbolize script as I'd like to understand
where in transaction commit the sleep is happening. 
	btrfs_commit_transaction+0x97c/0xbf0:

	btrfs_commit_transaction at fs/btrfs/transaction.c:2159 (discriminator 9)
	 2154
	 2155           ret = btrfs_run_delayed_items(trans);
	 2156           if (ret)
	 2157                   goto cleanup_transaction;
	 2158
	>2159<          wait_event(cur_trans->writer_wait,
	 2160                      extwriter_counter_read(cur_trans) == 0);
	 2161
	 2162           /* some pending stuffs might be added after the previous flush. */
	 2163           ret = btrfs_run_delayed_items(trans);
	 2164           if (ret)
So it seems there is an open transaction handle thus commit can't
continue and everything is stalled behind. Would you be able to run the
attached python script on a host which is stuck. It requires you having
debug symbols for the kernel installed as well as
https://github.com/osandov/drgn/ which is a scriptable debugger. The
easiest way would to follow the instructions at
https://drgn.readthedocs.io/en/latest/installation.html and just get it
via pip.


Once you have it installed run it by doing:

"sudo drgn get-num-extwriters.py 310dd372-0fd1-4496-a232-0fb46ca4afd6"

Where 310dd372-0fd1-4496-a232-0fb46ca4afd6 is the fsid as taken from
'blkid' which corresponds to the wedged fs.



<snip>

Attachments

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help