Re: kernel BUG at fs/btrfs/extent-tree.c:5038 (linux 3.4.7)
From: Stefan Behrens <hidden>
Date: 2012-08-08 16:06:54
On Wed, 8 Aug 2012 16:45:57 +0200, David Sterba wrote:
On Sun, Aug 05, 2012 at 04:11:47PM +0200, Olivier Bonvalet wrote:quoted
Aug 5 16:10:12 backup2 kernel: [ 58.674758] parent transid verify failed on 615015833600 wanted 110423 found 110424
1st mirror fails verify_parent_transid().
quoted
Aug 5 16:10:12 backup2 kernel: [ 58.675090] parent transid verify failed on 615015833600 wanted 110423 found 110424
2nd mirror fails verify_parent_transid().
quoted
Aug 5 16:10:12 backup2 kernel: [ 58.675523] btrfs read error corrected: ino 1 off 615015833600 (dev /dev/mapper/vg--backupplug-backup sector 1209083504)
That's a bug. It is wrong to ignore the previous results from verify_parent_transid() and to call repair_eb_io_failure() which rewrites one mirror and claims to have corrected an error. But it's not a major issue, just a misleading message in the kernel log and a disk write operation which does not repair anything.
This looks strange, the the corrupted block belongs to metadata, I assume you have the DUP profile, so there is a good copy that can be used instead, the error message confirms that, but ...quoted
Aug 5 16:10:12 backup2 kernel: [ 58.675536] Failed to read block groups: -5
That's correct, because the UPTODATE flag in the extent is not set (verify_parent_transid() clears it when it detects an error).
... ? -5 means EIO, which is returned when a block cannot be read, so unless there's a different reason for it, this looks like a missed oportunity to fix an error and continue. The same error messages are present in the logs from 3.4 version.quoted
Aug 5 16:10:12 backup2 kernel: [ 58.704720] btrfs: open_ctree failed
The summary is that the block was not correctable, both mirrors had the same old transid. The bug is that the call to repair_io_failure() should not have been done because verify_parent_transid() indicated errors. I'll prepare a patch for it. Changing btree_read_extent_buffer_pages() to set ret to -EIO if verify_parent_transid() fails should fix the issue.