Thread (11 messages) 11 messages, 2 authors, 2021-10-26

Re: Errors after successful disk replace

From: Emil Heimpel <hidden>
Date: 2021-10-19 10:49:11

Oct 19, 2021 07:35:54 Qu Wenruo [off-list ref]:

On 2021/10/19 11:54, Emil Heimpel wrote:
quoted
Hi all,

One of my drives of a raid 5 btrfs array failed (was dead completely) so I installed an identical replacement drive. The dead drive was devid 1 and the new drive /dev/sde. I used the following to replace the missing drive:

sudo btrfs replace start -B 1 /dev/sde1 /mnt/btrfsrepair/

and it completed successfully without any reported errors (took around 2 weeks though...).

I then tried to see my array with filesystem show, but it hung (or took longer than I wanted to wait), so I did a reboot.
Any dmesg of that time?
Nothing after the replace finished:

1634463961.245751 BlueQ kernel: BTRFS error (device sdb1): failed to rebuild valid logical 17663044222976 for dev (efault)
1634463961.255819 BlueQ kernel: BTRFS error (device sdb1): failed to rebuild valid logical 17663045795840 for dev (efault)
1634463961.275815 BlueQ kernel: BTRFS error (device sdb1): failed to rebuild valid logical 17663046582272 for dev (efault)
1634463961.275922 BlueQ kernel: BTRFS error (device sdb1): failed to rebuild valid logical 17663047368704 for dev (efault)
1634463961.339074 BlueQ kernel: BTRFS error (device sdb1): failed to rebuild valid logical 17663048155136 for dev (efault)
1634463961.339248 BlueQ kernel: BTRFS error (device sdb1): failed to rebuild valid logical 17663048941568 for dev (efault)
1634475910.611261 BlueQ kernel: sd 9:0:2:0: attempting task abort!scmd(0x0000000046fead3f), outstanding for 7120 ms & timeout 7000 ms
1634475910.615126 BlueQ kernel: sd 9:0:2:0: [sdd] tag#840 CDB: ATA command pass through(16) 85 08 2e 00 00 00 01 00 00 00 00 00 00 00 ec 00
1634475910.615429 BlueQ kernel: scsi target9:0:2: handle(0x000b), sas_address(0x4433221105000000), phy(5)
1634475910.615691 BlueQ kernel: scsi target9:0:2: enclosure logical id(0x590b11c022f3fb00), slot(6)
1634475910.787911 BlueQ kernel: sd 9:0:2:0: task abort: SUCCESS scmd(0x0000000046fead3f)
1634475910.807083 BlueQ kernel: sd 9:0:2:0: Power-on or device reset occurred
1634475949.877998 BlueQ kernel: sd 9:0:2:0: Power-on or device reset occurred
1634525944.213931 BlueQ kernel: perf: interrupt took too long (3138 > 3137), lowering kernel.perf_event_max_sample_rate to 63600
1634533791.168760 BlueQ kernel: BTRFS error (device sdb1): failed to rebuild valid logical 22996545634304 for dev (efault)
1634552685.203559 BlueQ kernel: BTRFS error (device sdb1): failed to rebuild valid logical 23816815706112 for dev (efault)
1634558977.979621 BlueQ kernel: BTRFS info (device sdb1): dev_replace from <missing disk> (devid 1) to /dev/sde1 finished
1634560793.132731 BlueQ kernel: zram0: detected capacity change from 32610864 to 0
1634560793.169379 BlueQ kernel: zram: Removed device: zram0
1634560883.549481 BlueQ kernel: watchdog: watchdog0: watchdog did not stop!
1634560883.556038 BlueQ systemd-shutdown[1]: Syncing filesystems and block devices.
1634560883.572840 BlueQ systemd-shutdown[1]: Sending SIGTERM to remaining processes...



quoted
It showed up after a reboot as followed:

Label: 'BlueButter'  uuid: 7e3378e6-da46-4a60-b9b8-1bcc306986e3
        Total devices 6 FS bytes used 20.96TiB
        devid    0 size 7.28TiB used 5.46TiB path /dev/sde1
        devid    2 size 7.28TiB used 5.46TiB path /dev/sdb1
        devid    3 size 2.73TiB used 2.73TiB path /dev/sdg1
        devid    4 size 2.73TiB used 2.73TiB path /dev/sdd1
        devid    5 size 7.28TiB used 4.81TiB path /dev/sdf1
        devid    6 size 7.28TiB used 5.33TiB path /dev/sdc1

I then tried to mount it, but it failed, so I run a readonly check and it reported the following problem:
And dmesg for the failed mount?
Oops, I must have missed that it failed because of missing devid 1 too...

1634562944.145383 BlueQ kernel: BTRFS info (device sde1): flagging fs with big metadata feature
1634562944.145529 BlueQ kernel: BTRFS info (device sde1): force zstd compression, level 2
1634562944.145650 BlueQ kernel: BTRFS info (device sde1): using free space tree
1634562944.145697 BlueQ kernel: BTRFS info (device sde1): has skinny extents
1634562944.148709 BlueQ kernel: BTRFS error (device sde1): devid 1 uuid 51645efd-bf95-458d-b5ae-b31623533abb is missing
1634562944.148764 BlueQ kernel: BTRFS error (device sde1): failed to read chunk tree: -2
1634562944.185369 BlueQ kernel: BTRFS error (device sde1): open_ctree failed
Thanks,
Qu
quoted
[...]
[2/7] checking extents
ERROR: super total bytes 38007432437760 smaller than real device(s) size 46008994590720
ERROR: mounting this fs may fail for newer kernels
ERROR: this can be fixed by 'btrfs rescue fix-device-size'
[3/7] checking free space tree
[...]

So I followed that advice but got the following error:

sudo btrfs rescue fix-device-size /dev/sde1
ERROR: devid 1 is missing or not writeable
ERROR: fixing device size needs all device(s) to be present and writeable

So it seems something went wrong or didn't complete fully.
What can I do to fix this problem?

uname -a
Linux BlueQ 5.14.12-arch1-1 #1 SMP PREEMPT Wed, 13 Oct 2021 16:58:16 +0000 x86_64 GNU/Linux

btrfs --version
btrfs-progs v5.14.2

Regards,
Emil

P.S.: Yes, I know, raid5 isn't stable but it works good enough for me ;)
Metadata is raid1 btw...
  
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help