Re: generic/418 regression seen on 5.12-rc3
From: "Theodore Ts'o" <tytso@mit.edu>
Date: 2021-03-18 19:42:53
On Thu, Mar 18, 2021 at 02:16:13PM -0400, Eric Whitney wrote:
As mentioned in today's ext4 concall, I've seen generic/418 fail from time to
time when run on 5.12-rc3 and 5.12-rc1 kernels. This first occurred when
running the 1k test case using kvm-xfstests. I was then able to bisect the
failure to a patch landed in the -rc1 merge window:
(bd8a1f3655a7) mm/filemap: support readpage splitting a page
Typical test output resulting from a failure looks like:
QA output created by 418
+cmpbuf: offset 0: Expected: 0x1, got 0x0
+[6:0] FAIL - comparison failed, offset 3072
+diotest -w -b 512 -n 8 -i 4 failed at loop 0
Silence is golden
...
I've also been able to reproduce the failure on -rc3 in the 4k test case as
well. The failure frequency there was 10 out of 100 runs. It was anywhere
from 2 to 8 failures out of 100 runs in the 1k case.FWIW, testing on a kernel which is -rc2 based (ext4.git's tip) I wasn't able to see a failure using gce-xfstests using the ext4/4k, ext4/1k, and xfs/1k test scenarios. This may be because of the I/O timing for the persistent disk block device in GCE, or differences in the number of CPU's or amount of memory available --- or in the kernel configuration that was used to build it. I'm currently retrying with -rc3, with and without the kernel debug configs, to see if that makes any difference... - Ted