Thread (20 messages) 20 messages, 4 authors, 2007-05-29

Re: raid5: I lost a XFS file system due to a minor IDE cable problem

From: Pallai Roland <hidden>
Date: 2007-05-28 11:17:31
Also in: linux-xfs

On Monday 28 May 2007 04:17:18 David Chinner wrote:
On Mon, May 28, 2007 at 03:50:17AM +0200, Pallai Roland wrote:
quoted
On Monday 28 May 2007 02:30:11 David Chinner wrote:
quoted
On Fri, May 25, 2007 at 04:35:36PM +0200, Pallai Roland wrote:
quoted
.and I've spammed such messages. This "internal error" isn't a good
reason to shut down the file system?
Actaully, that error does shut the filesystem down in most cases. When
you see that output, the function is returning -EFSCORRUPTED. You've
got a corrupted freespace btree.

The reason why you get spammed is that this is happening during
background writeback, and there is no one to return the -EFSCORRUPTED
error to. The background writeback path doesn't specifically detect
shut down filesystems or trigger shutdowns on errors because that
happens in different layers so you just end up with failed data writes.
These errors will occur on the next foreground data or metadata
allocation and that will shut the filesystem down at that point.

I'm not sure that we should be ignoring EFSCORRUPTED errors here; maybe
in this case we should be shutting down the filesystem.  That would
certainly cut down on the spamming and would not appear to change
anything other behaviour....
 If I remember correctly, my file system wasn't shutted down at all, it
was "writeable" for whole night, the yafc slowly "written" files to it.
Maybe all write operations had failed, but yafc doesn't warn.
So you never created new files or directories, unlinked files or
directories, did synchronous writes, etc? Just had slowly growing files?
 I just overwritten badly downloaded files.
quoted
 Spamming is just annoying when we need to find out what went wrong (My
kernel.log is 300Mb), but for data security it's important to react to
EFSCORRUPTED error in any case, I think so. Please consider this.
The filesystem has responded correctly to the corruption in terms of
data security (i.e. failed the data write and warned noisily about
it), but it probably hasn't done everything it should....

Hmmmm. A quick look at the linux code makes me thikn that background
writeback on linux has never been able to cause a shutdown in this
case. However, the same error on Irix will definitely cause a
shutdown, though....
 I hope Linux will follow Irix, that's a consistent standpoint.


 David, have you a plan to implement your "reporting raid5 block layer" idea? 
No one else has caring about this silent data loss on temporary (cable, 
power) failed raid5 arrays as I see, I really hope you do at least!


--
 d
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help