Re: Apparent serious progressive ext4 data corruption bug in 3.6.3 (and... | linux-ext4

Re: Apparent serious progressive ext4 data corruption bug in 3.6.3 (and other stable branches?)

From: Eric Sandeen <hidden>
Date: 2012-10-27 21:34:22

Possibly related (same subject, not in this thread)

2012-10-29 · Re: Apparent serious progressive ext4 data corruption bug in 3.6.3 (and other stable branches?) · Eric Sandeen <hidden>
2012-10-29 · Re: Apparent serious progressive ext4 data corruption bug in 3.6.3 (and other stable branches?) · Nix <hidden>
2012-10-27 · Re: Apparent serious progressive ext4 data corruption bug in 3.6.3 (and other stable branches?) · Eric Sandeen <hidden>
2012-10-27 · Re: Apparent serious progressive ext4 data corruption bug in 3.6.3 (and other stable branches?) · "Theodore Ts'o" <tytso@mit.edu>
2012-10-27 · Re: Apparent serious progressive ext4 data corruption bug in 3.6.3 (and other stable branches?) · Nix <hidden>

On 10/27/12 4:29 PM, Nix wrote:

On 27 Oct 2012, Eric Sandeen spake thusly:

quoted

On 10/27/12 4:21 PM, Nix wrote:

quoted

On 27 Oct 2012, Eric Sandeen verbalised:

quoted

That's what we needed.  Woulda been great a few days ago ;)

*wince* sorry!

It's ok, I know sometimes this testing takes time.

It took much less time once I figured out that umount -l at the last
moment before reboot would reliably corrupt one filesystem and one
filesystem only. Before that, I was having to fsck 2.5Tb of filesystems
on every test run, just in case the latest reboot had zapped them too...

quoted

It has exposed the fact that we are not doing a good job
regression testing all of the available configurations.

This is the Linux kernel: what was it Linus joked years ago, users are
the test load? I'm impressed you have any regression testing at all, let
alone as much as you seem to. :P :P

Well, that should not be the case, or at least minimized.  It takes
constant vigilance...

(But, seriously, fsstress is a wonderful thing. And the kernel's test
culture *is* improving, and I'm happy to see filesystem hackers in the
front line.)

I've been testing with a hacked up devicemapper target which creates
a "dirty" snapshot which requires a replay; saves the actual power
drop & restore cycle, and I could repro the journal_checksum bug
right off.

XFS has an ioctl to make this easy in regression testing, and several
tests in xfstests do cover xfs journal recovery.  We need
to add such a thing to ext4.  Not being able to programatically 
test recovery is a problem.

-Eric

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help