Thread (13 messages) 13 messages, 3 authors, 2012-10-29

Re: Apparent serious progressive ext4 data corruption bug in 3.6.3 (and other stable branches?)

From: Eric Sandeen <hidden>
Date: 2012-10-27 21:34:22

Possibly related (same subject, not in this thread)

On 10/27/12 4:29 PM, Nix wrote:
On 27 Oct 2012, Eric Sandeen spake thusly:
quoted
On 10/27/12 4:21 PM, Nix wrote:
quoted
On 27 Oct 2012, Eric Sandeen verbalised:
quoted
That's what we needed.  Woulda been great a few days ago ;)
*wince* sorry!
It's ok, I know sometimes this testing takes time.
It took much less time once I figured out that umount -l at the last
moment before reboot would reliably corrupt one filesystem and one
filesystem only. Before that, I was having to fsck 2.5Tb of filesystems
on every test run, just in case the latest reboot had zapped them too...
quoted
It has exposed the fact that we are not doing a good job
regression testing all of the available configurations.
This is the Linux kernel: what was it Linus joked years ago, users are
the test load? I'm impressed you have any regression testing at all, let
alone as much as you seem to. :P :P
Well, that should not be the case, or at least minimized.  It takes
constant vigilance... 
(But, seriously, fsstress is a wonderful thing. And the kernel's test
culture *is* improving, and I'm happy to see filesystem hackers in the
front line.)
I've been testing with a hacked up devicemapper target which creates
a "dirty" snapshot which requires a replay; saves the actual power
drop & restore cycle, and I could repro the journal_checksum bug
right off.

XFS has an ioctl to make this easy in regression testing, and several
tests in xfstests do cover xfs journal recovery.  We need
to add such a thing to ext4.  Not being able to programatically 
test recovery is a problem.

-Eric
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help