Thread (26 messages) 26 messages, 3 authors, 2021-02-21

Re: "bad tree block start" when trying to mount on ARM

From: Erik Jensen <hidden>
Date: 2021-02-18 04:04:40

Possibly related (same subject, not in this thread)

On Wed, Feb 17, 2021 at 5:24 PM Qu Wenruo [off-list ref] wrote:
On 2021/2/11 上午7:47, Qu Wenruo wrote:
quoted
On 2021/2/11 上午6:17, Erik Jensen wrote:
quoted
On Tue, Feb 9, 2021 at 9:47 PM Qu Wenruo [off-list ref] wrote:
[...]
quoted
quoted
Unfortunately I didn't get much useful info from the trace events.
As a lot of the values doesn't even make sense to me....

But the chunk tree dump proves to be more useful.

Firstly, the offending tree block doesn't even occur in chunk chunk
ranges.

The offending tree block is 26207780683776, but the tree dump doesn't
have any range there.

The highest chunk is at 5958289850368 + 4294967296, still one digit
lower than the expected value.

I'm surprised we didn't even get any error for that, thus it may
indicate our chunk mapping is incorrect too.

Would you please try the following diff on the 32bit system and report
back the dmesg?

The diff adds the following debug output:
- when we try to read one tree block
- when a bio is mapped to read device
- when a new chunk is added to chunk tree

Thanks,
Qu
Okay, here's the dmesg output from attempting to mount the filesystem:
https://gist.github.com/rkjnsn/914651efdca53c83199029de6bb61e20

I captured this on my 32-bit x86 VM, as it's much faster to rebuild
the kernel there than on my ARM board, and it fails with the same
error.
This is indeed much better.

The involved things are:

[   84.463147] read_one_chunk: chunk start=26207148048384 len=1073741824
num_stripes=2 type=0x14
[   84.463148] read_one_chunk:    stripe 0 phy=6477927415808 devid=5
[   84.463149] read_one_chunk:    stripe 1 phy=6477927415808 devid=4

Above is the chunk for the offending tree block.

[   84.463724] read_extent_buffer_pages: eb->start=26207780683776 mirror=0
[   84.463731] submit_stripe_bio: rw 0 0x1000, phy=2118735708160
sector=4138155680 dev_id=3 size=16384
[   84.470793] BTRFS error (device dm-4): bad tree block start, want
26207780683776 have 3395945502747707095

But when the metadata read happens, the physical address and dev id is
completely insane.

The chunk doesn't have dev 3 in it at all, but we still get the wrong
mapping.

Furthermore, that physical and devid belongs to chunk 8614760677376,
which is raid5 data chunk.

So there is definitely something wrong in btrfs chunk mapping on 32bit.

I'll craft a newer debug diff for you after I pinned down which can be
wrong.
Sorry for the delay, mostly due to lunar new year vocation.

Here is the new diff, it should be applied upon previous diff.

This new diff would add extra debug info inside __btrfs_map_block().

BTW, you only need to rebuild btrfs module to test it, hopes this saves
you some time.

Although if I could got a small enough image to reproduce locally, it
would be the best case...

Thanks,
Qu
quoted
Thanks,
Qu
Okay, here is the output with both patches applied:
https://gist.github.com/rkjnsn/7139eaf855687c6bd4ce371f88e28a9e

I've only run into the issue on this filesystem, which is quite large,
so I'm not sure how I would even attempt to make a reduced test case.

Thanks!
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help