Thread (19 messages) 19 messages, 2 authors, 2021-02-11

Re: [PATCH 2/6] common: capture metadump output if xfs filesystem check fails

From: "Darrick J. Wong" <djwong@kernel.org>
Date: 2021-02-11 18:27:54
Also in: fstests

On Thu, Feb 11, 2021 at 08:59:58AM -0500, Brian Foster wrote:
On Tue, Feb 09, 2021 at 06:56:30PM -0800, Darrick J. Wong wrote:
quoted
From: Darrick J. Wong <redacted>

Capture metadump output when various userspace repair and checker tools
fail or indicate corruption, to aid in debugging.  We don't bother to
annotate xfs_check because it's bitrotting.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 README     |    2 ++
 common/xfs |   26 ++++++++++++++++++++++++++
 2 files changed, 28 insertions(+)

diff --git a/README b/README
index 43bb0cee..36f72088 100644
--- a/README
+++ b/README
@@ -109,6 +109,8 @@ Preparing system for tests:
              - Set TEST_FS_MODULE_RELOAD=1 to unload the module and reload
                it between test invocations.  This assumes that the name of
                the module is the same as FSTYP.
+	     - Set SNAPSHOT_CORRUPT_XFS=1 to record compressed metadumps of XFS
+	       filesystems if the various stages of _check_xfs_filesystem fail.
 
         - or add a case to the switch in common/config assigning
           these variables based on the hostname of your test
diff --git a/common/xfs b/common/xfs
index 2156749d..ad1eb6ee 100644
--- a/common/xfs
+++ b/common/xfs
@@ -432,6 +432,21 @@ _supports_xfs_scrub()
 	return 0
 }
 
+# Save a compressed snapshot of a corrupt xfs filesystem for later debugging.
+_snapshot_xfs() {
The term snapshot has a well known meaning. Can we just call this
_metadump_xfs()?
Ok.
quoted
+	local metadump="$1"
+	local device="$2"
+	local logdev="$3"
+	local options="-a -o"
+
+	if [ "$logdev" != "none" ]; then
+		options="$options -l $logdev"
+	fi
+
+	$XFS_METADUMP_PROG $options "$device" "$metadump" >> "$seqres.full" 2>&1
+	gzip -f "$metadump" >> "$seqres.full" 2>&1 &
Why compress in the background?
Sometimes the metadumps can become very large and I don't tend to have a
lot of space on the test appliances for storing blobs.

Also, I was under the impression that it was customary for people to
share compressed metadumps of crashes, so why not save everyone a step?

I do this in the background to avoid holding up the next fstest.
I wonder if we should just skip the
compression step since this requires an option to enable in the first
place..
Seeing as it's optional, I think that's all the more reason to compress.
quoted
+}
+
 # run xfs_check and friends on a FS.
 _check_xfs_filesystem()
 {
...
quoted
@@ -540,6 +564,8 @@ _check_xfs_filesystem()
 			cat $tmp.repair				>>$seqres.full
 			echo "*** end xfs_repair output"	>>$seqres.full
 
+			test "$SNAPSHOT_CORRUPT_XFS" = "1" && \
+				_snapshot_xfs "$seqres.rebuildrepair.md" "$device" "$2"
Why do we collect so many metadump images? Shouldn't all but the last
TEST_XFS_REPAIR_REBUILD thing not modify the fs? If so, it seems like we
should be able to collect one image (and perhaps just call it
"$seqres.$device.md") if any of the first several checks flag a problem.
Yes, the number of metadumps collected can be reduced to two.  One if
scrub or logprint or repair -n fail, and a second one if the user set
TEST_XFS_REPAIR_REBUILD=1 and either the repair or the repair -n fail.

Will change that.

--D
Brian
quoted
 			ok=0
 		fi
 		rm -f $tmp.repair
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help