Thread (8 messages) 8 messages, 2 authors, 2021-12-16

Re: Balance loop on "device delete missing"? (RAID 1, Linux 5.15, "found 1 extents, stage: update data pointers")

From: Lukas Pirl <hidden>
Date: 2021-12-05 11:54:17

Hello Zygo,

it took me (and the disks) a while to report back; here we go:

On Thu, 2021-12-02 13:11 -0500, Zygo Blaxell wrote as excerpted:
quoted
On Thu, 2021-11-25 19:06 +0100, Lukas Pirl wrote as excerpted:
quoted
Dear btrfs community,

this is another report of a probably endless balance which loops on
"found 1 extents, stage: update data pointers".

I observe it on a btrfs RAID 1 on around 7 luks-encrypted spinning
disks (more fs details below) used for storing cold data. One disk
failed physically. Now, I try to "btrfs device delete missing". The
operation runs forever (probably, waited more than 30 days, another
time more than 50 days).

dmesg says:
[      22:26] BTRFS info (device dm-1): relocating block group
1109204664320
flags data|raid1
[      22:27] BTRFS info (device dm-1): found 4164 extents, stage: move
data
extents
[  +5.476247] BTRFS info (device dm-1): found 4164 extents, stage: update
data pointers
[  +2.545299] BTRFS info (device dm-1): found 1 extents, stage: update
data
pointers

and then the last message repeats every ~ .25 seconds ("forever").
Memory and CPU usage are not excessive (most is IO wait, I assume).

What I have tried:
* Linux 4 (multiple minor versions, don't remember which exactly)
* Linux 5.10
* Linux 5.15
* btrfs-progs v5.15
* remove subvolues (before: ~ 200, after: ~ 90)
* free space cache v1, v2, none
* reboot, restart removal/balance (multiple times)
Does it always happen on the same block group?  If so, that points to
something lurking in your metadata.  If a reboot fixes it for one block
group and then it gets stuck on some other block group, it points to
an issue in kernel memory state.
Although I haven't paid attention to the block group number in the past,
another run of ``btrfs dev del`` just now gave the same last block group
number (1109204664320) before, presumably, looping.
What do you get from 'btrfs check --readonly'?
$ btrfs check --readonly --mode lowmem /dev/disk/by-label/pool_16-03

[1/7] checking root items
Opening filesystem to check...
warning, device 6 is missing
Checking filesystem on /dev/disk/by-label/pool_16-03
UUID: 59301fea-434a-4c43-bb45-08fcfe8ce113
[2/7] checking extents
ERROR: extent[1109584044032, 8192] referencer count mismatch (root: 276,
owner: 1154248, offset: 100401152) wanted: 1, have: 0
ERROR: errors found in extent allocation tree or chunk allocation
[3/7] checking free space tree
[4/7] checking fs roots
[5/7] checking only csums items (without verifying data)
[6/7] checking root refs done with fs roots in lowmem mode, skipping
[7/7] checking quota groups skipped (not enabled on this FS)
found 4252313206784 bytes used, error(s) found
total csum bytes: 4128183360
total tree bytes: 25053184000
total fs tree bytes: 16415014912
total extent tree bytes: 3662594048
btree space waste bytes: 4949241278
file data blocks allocated: 8025128243200
 referenced 7552211206144

Thanks for your help

Lukas
quoted
quoted
======================================================================

filesystem show
===============

Label: 'pool_16-03'  uuid: 59301fea-434a-xxxx-bb45-08fcfe8ce113
        Total devices 8 FS bytes used 3.84TiB
        devid    1 size 931.51GiB used 592.00GiB path /dev/mapper/WD-
WCAU45xxxx03
        devid    3 size 1.82TiB used 1.37TiB path /dev/mapper/WD-
WCAZAFxxxx78
        devid    4 size 931.51GiB used 593.00GiB path /dev/mapper/WD-
WCC4J7xxxxSZ
        devid    5 size 1.82TiB used 1.46TiB path /dev/mapper/WD-
WCC4M2xxxxXH
        devid    7 size 931.51GiB used 584.00GiB path
/dev/mapper/S1xxxxJ3
        devid    9 size 2.73TiB used 2.28TiB path /dev/mapper/WD-
WCC4N3xxxx17
        devid   10 size 3.64TiB used 1.03TiB path /dev/mapper/WD-
WCC7K2xxxxNS
        *** Some devices missing

subvolumes
==========

~ 90, of which ~ 60 are read-only snapshots of the other ~ 30

filesystem usage
================

Overall:
    Device size:                  12.74TiB
    Device allocated:              8.36TiB
    Device unallocated:            4.38TiB
    Device missing:                  0.00B
    Used:                          7.69TiB
    Free (estimated):              2.50TiB      (min: 2.50TiB)
    Free (statfs, df):             1.46TiB
    Data ratio:                       2.00
    Metadata ratio:                   2.00
    Global reserve:              512.00MiB      (used: 48.00KiB)
    Multiple profiles:                  no

Data,RAID1: Size:4.14TiB, Used:3.82TiB (92.33%)
   /dev/mapper/WD-WCAU45xxxx03   584.00GiB
   /dev/mapper/WD-WCAZAFxxxx78     1.35TiB
   /dev/mapper/WD-WCC4J7xxxxSZ   588.00GiB
   /dev/mapper/WD-WCC4M2xxxxXH     1.44TiB
   missing       510.00GiB
   /dev/mapper/S1xxxxJ3  579.00GiB
   /dev/mapper/WD-WCC4N3xxxx17     2.26TiB
   /dev/mapper/WD-WCC7K2xxxxNS     1.01TiB

Metadata,RAID1: Size:41.00GiB, Used:23.14GiB (56.44%)
   /dev/mapper/WD-WCAU45xxxx03     8.00GiB
   /dev/mapper/WD-WCAZAFxxxx78    17.00GiB
   /dev/mapper/WD-WCC4J7xxxxSZ     5.00GiB
   /dev/mapper/WD-WCC4M2xxxxXH    13.00GiB
   missing         3.00GiB
   /dev/mapper/S1xxxxJ3    5.00GiB
   /dev/mapper/WD-WCC4N3xxxx17    16.00GiB
   /dev/mapper/WD-WCC7K2xxxxNS    15.00GiB

System,RAID1: Size:32.00MiB, Used:848.00KiB (2.59%)
   missing        32.00MiB
   /dev/mapper/WD-WCC4N3xxxx17    32.00MiB

Unallocated:
   /dev/mapper/WD-WCAU45xxxx03   339.51GiB
   /dev/mapper/WD-WCAZAFxxxx78   461.01GiB
   /dev/mapper/WD-WCC4J7xxxxSZ   338.51GiB
   /dev/mapper/WD-WCC4M2xxxxXH   373.01GiB
   missing      -513.03GiB
   /dev/mapper/S1xxxxJ3  347.51GiB
   /dev/mapper/WD-WCC4N3xxxx17   460.47GiB
   /dev/mapper/WD-WCC7K2xxxxNS     2.61TiB

dump-super
==========

superblock: bytenr=65536, device=/dev/mapper/WD-WCAU45xxxx03
---------------------------------------------------------
csum_type               0 (crc32c)
csum_size               4
csum                    0x51beb068 [match]
bytenr                  65536
flags                   0x1
                        ( WRITTEN )
magic                   _BHRfS_M [match]
fsid                    59301fea-434a-xxxx-bb45-08fcfe8ce113
metadata_uuid           59301fea-434a-xxxx-bb45-08fcfe8ce113
label                   pool_16-03
generation              113519755
root                    15602414796800
sys_array_size          129
chunk_root_generation   63394299
root_level              1
chunk_root              19216820502528
chunk_root_level        1
log_root                0
log_root_transid        0
log_root_level          0
total_bytes             16003136864256
bytes_used              4227124142080
sectorsize              4096
nodesize                16384
leafsize (deprecated)   16384
stripesize              4096
root_dir                6
num_devices             8
compat_flags            0x0
compat_ro_flags         0x0
incompat_flags          0x371
                        ( MIXED_BACKREF |
                          COMPRESS_ZSTD |
                          BIG_METADATA |
                          EXTENDED_IREF |
                          SKINNY_METADATA |
                          NO_HOLES )
cache_generation        2975866
uuid_tree_generation    113519755
dev_item.uuid           a9b2e4ea-404c-xxxx-a450-dc84b0956ce1
dev_item.fsid           59301fea-434a-xxxx-bb45-08fcfe8ce113 [match]
dev_item.type           0
dev_item.total_bytes    1000201740288
dev_item.bytes_used     635655159808
dev_item.io_align       4096
dev_item.io_width       4096
dev_item.sector_size    4096
dev_item.devid          1
dev_item.dev_group      0
dev_item.seek_speed     0
dev_item.bandwidth      0
dev_item.generation     0

device stats
============

[/dev/mapper/WD-WCAU45xxxx03].write_io_errs    0
[/dev/mapper/WD-WCAU45xxxx03].read_io_errs     0
[/dev/mapper/WD-WCAU45xxxx03].flush_io_errs    0
[/dev/mapper/WD-WCAU45xxxx03].corruption_errs  0
[/dev/mapper/WD-WCAU45xxxx03].generation_errs  0
[/dev/mapper/WD-WCAZAFxxxx78].write_io_errs    0
[/dev/mapper/WD-WCAZAFxxxx78].read_io_errs     0
[/dev/mapper/WD-WCAZAFxxxx78].flush_io_errs    0
[/dev/mapper/WD-WCAZAFxxxx78].corruption_errs  0
[/dev/mapper/WD-WCAZAFxxxx78].generation_errs  0
[/dev/mapper/WD-WCC4J7xxxxSZ].write_io_errs    0
[/dev/mapper/WD-WCC4J7xxxxSZ].read_io_errs     1
[/dev/mapper/WD-WCC4J7xxxxSZ].flush_io_errs    0
[/dev/mapper/WD-WCC4J7xxxxSZ].corruption_errs  0
[/dev/mapper/WD-WCC4J7xxxxSZ].generation_errs  0
[/dev/mapper/WD-WCC4M2xxxxXH].write_io_errs    0
[/dev/mapper/WD-WCC4M2xxxxXH].read_io_errs     0
[/dev/mapper/WD-WCC4M2xxxxXH].flush_io_errs    0
[/dev/mapper/WD-WCC4M2xxxxXH].corruption_errs  0
[/dev/mapper/WD-WCC4M2xxxxXH].generation_errs  0
[devid:6].write_io_errs    0
[devid:6].read_io_errs     0
[devid:6].flush_io_errs    0
[devid:6].corruption_errs  72016
[devid:6].generation_errs  100
[/dev/mapper/S1xxxxJ3].write_io_errs    0
[/dev/mapper/S1xxxxJ3].read_io_errs     0
[/dev/mapper/S1xxxxJ3].flush_io_errs    0
[/dev/mapper/S1xxxxJ3].corruption_errs  2
[/dev/mapper/S1xxxxJ3].generation_errs  0
[/dev/mapper/WD-WCC4N3xxxx17].write_io_errs    0
[/dev/mapper/WD-WCC4N3xxxx17].read_io_errs     0
[/dev/mapper/WD-WCC4N3xxxx17].flush_io_errs    0
[/dev/mapper/WD-WCC4N3xxxx17].corruption_errs  0
[/dev/mapper/WD-WCC4N3xxxx17].generation_errs  0
[/dev/mapper/WD-WCC7K2xxxxNS].write_io_errs    0
[/dev/mapper/WD-WCC7K2xxxxNS].read_io_errs     0
[/dev/mapper/WD-WCC7K2xxxxNS].flush_io_errs    0
[/dev/mapper/WD-WCC7K2xxxxNS].corruption_errs  0
[/dev/mapper/WD-WCC7K2xxxxNS].generation_errs  0
  
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help