Thread (16 messages) 16 messages, 4 authors, 2017-09-01

Re: corrupt xfs log

From: Brian Foster <hidden>
Date: 2017-08-30 14:58:13

On Mon, Aug 21, 2017 at 10:24:32PM +0200, Ingard - wrote:
On Mon, Aug 21, 2017 at 5:51 PM, Brian Foster [off-list ref] wrote:
quoted
On Mon, Aug 21, 2017 at 02:08:43PM +0200, Ingard - wrote:
quoted
On Fri, Aug 18, 2017 at 2:17 PM, Brian Foster [off-list ref] wrote:
quoted
On Fri, Aug 18, 2017 at 07:02:24AM -0500, Bill O'Donnell wrote:
quoted
On Fri, Aug 18, 2017 at 01:56:31PM +0200, Ingard - wrote:
quoted
After a server crash we've encountered a corrupt xfs filesystem. When
trying to mount said filesystem normally the system hangs.
This was initially on a ubuntu trusty server with 3.13 kernel with
xfsprogs 3.1.9

We've installed a newer kernel (4.4.0-92) and compiled xfsprogs v
4.12.0 from source. We're still not able to mount the filesystem (and
replay the log) normally.
We are able to mount it -o ro,norecovery, but we're reluctant to do
xfs_repair -L without trying everything we can first. The filesystem
is browsable albeit a few paths which gives an error : "Structure
needs cleaning"

Does anyone have any advice as to how we might recover/repair the
corrupt log so we can replay it? Or is xfs_repair -L the only way
forward?
Can you try xfs_repair -n (only scans the fs and reports what repairs
would be made)?
An xfs_metadump of the fs might be useful as well. Then we can see if we
can reproduce the mount hang on latest kernels and if so, potentially
try and root cause it.

Brian
Here is a link for the metadump :
https://www.jottacloud.com/p/ingardme/95ec2e45ba80431d962345981d38bdff
This points to a 29GB image file, apparently uncompressed..? Could you
upload a compressed file? Thanks.
Hi. Sorry about that. Didnt realize the output would be compressable.
Here is a link to the compressed tgz (6G)
https://www.jottacloud.com/p/ingardme/cac6939649e14b98b928647f5222a2ae
I finally played around with this image a bit. Note that mount does not
hang on latest kernels. Instead, log recovery emits a torn write message
due to a bad crc at the head of the log and then ultimately fails due to
a bad crc at the tail of the log. I ran a couple experiments to skip the
bad crc records and/or to completely ignore all bad crc's and both still
either fail to mount (due to other corruption) or continue to show
corruption in the recovered fs. 

It's not clear to me what would have caused this corruption or log
state. Have you encountered any corruption before? If not, is this kind
of crash or unclean shutdown of the server an uncommon event?

That aside, I think the best course of action is to run 'xfs_repair -L'
on the fs. I ran a v4.12 version against the metadump image and it
successfully repaired the fs. I've attached the repair output for
reference, but I would recommend to first restore your metadump to a
temporary location, attempt to repair that and examine the results
before repairing the original fs. Note that the metadump will not have
any file content, but will represent which files might be cleared, moved
to lost+found, etc.

Brian
quoted
Brian
quoted
And the repair -n output :
https://www.jottacloud.com/p/ingardme/0205c6ca6f7e495ebcda5f255b96f63d

kind regards
ingard
quoted
quoted
Thanks-
Bill

quoted

Excerpt from kern.log:
2017-08-17T13:40:41.122121+02:00 dn-238 kernel: [  294.300347] XFS
(sdd1): Mounting V4 filesystem in no-recovery mode. Filesystem will be
inconsistent.

2017-08-17T17:04:54.794194+02:00 dn-238 kernel: [12548.400260] XFS
(sdd1): Metadata corruption detected at xfs_inode_buf_verify+0x6f/0xd0
[xfs], xfs_inode block 0x81c9c210
2017-08-17T17:04:54.794216+02:00 dn-238 kernel: [12548.400342] XFS
(sdd1): Unmount and run xfs_repair
2017-08-17T17:04:54.794218+02:00 dn-238 kernel: [12548.400374] XFS
(sdd1): First 64 bytes of corrupted metadata buffer:
2017-08-17T17:04:54.794220+02:00 dn-238 kernel: [12548.400418]
ffff880171fff000: 3f 1a 33 54 5b 55 85 0b 7c f5 c6 d5 cf 51 47 41
?.3T[U..|....QGA
2017-08-17T17:04:54.794222+02:00 dn-238 kernel: [12548.400473]
ffff880171fff010: 97 ba ba 03 5c e4 02 7a e6 bc fb 5d f1 72 db c1
....\..z...].r..
2017-08-17T17:04:54.794223+02:00 dn-238 kernel: [12548.400527]
ffff880171fff020: c8 ad 3a 76 c7 e4 20 92 88 a2 35 0c 1f 36 cf b5
..:v.. ...5..6..
2017-08-17T17:04:54.794226+02:00 dn-238 kernel: [12548.400581]
ffff880171fff030: 8a bc 42 75 86 50 a0 a2 be 2c 2d 99 96 2d e1 ee
..Bu.P...,-..-..

kind regards
ingard
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Attachments

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help