Re: raid5: I lost a XFS file system due to a minor IDE cable problem
From: David Chinner <hidden>
Date: 2007-05-29 03:28:03
Also in:
linux-xfs
On Mon, May 28, 2007 at 05:45:27PM -0500, Alberto Alonso wrote:
On Fri, 2007-05-25 at 18:36 +1000, David Chinner wrote:quoted
On Fri, May 25, 2007 at 12:43:51AM -0500, Alberto Alonso wrote:quoted
I think his point was that going into a read only mode causes a less catastrophic situation (ie. a web server can still serve pages).Sure - but once you've detected one corruption or had metadata I/O errors, can you trust the rest of the filesystem?quoted
I think that is a valid point, rather than shutting down the file system completely, an automatic switch to where the least disruption of service can occur is always desired.I consider the possibility of serving out bad data (i.e after a remount to readonly) to be the worst possible disruption of service that can happen ;)I guess it does depend on the nature of the failure. A write failure on block 2000 does not imply corruption of the other 2TB of data.
The rest might not be corrupted, but if block 2000 is a index of some sort (i.e. metadata), you could reference any of that 2TB incorrectly and get the wrong data, write to the wrong spot on disk, etc.
quoted
quoted
I personally have found the XFS file system to be great for my needs (except issues with NFS interaction, where the bug report never got answered), but that doesn't mean it can not be improved.Got a pointer?I can't seem to find it. I'm pretty sure I used bugzilla to report it. I did find the kernel dump file though, so here it is: Oct 3 15:34:07 localhost kernel: xfs_iget_core: ambiguous vns: vp/0xd1e69c80, invp/0xc989e380
Oh, I haven't seen any of those problems for quite some time.
= /proc/kmsg started. Oct 3 15:51:23 localhost kernel: Inspecting /boot/System.map-2.6.8-2-686-smp
Oh, well, yes, kernels that old did have that problem. It got fixed some time around 2.6.12 or 2.6.13 IIRC.... Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group