Thread (17 messages) 17 messages, 7 authors, 2011-05-02

Re: Bug#624343: linux-image-2.6.38-2-amd64: frequent message "bio too big device md0 (248 > 240)" in kern.log

From: Ben Hutchings <hidden>
Date: 2011-05-02 00:00:57

On Sun, 2011-05-01 at 15:06 -0700, Jameson Graef Rollins wrote:
On Fri, 29 Apr 2011 05:39:40 +0100, Ben Hutchings [off-list ref] wrote:
quoted
On Wed, 2011-04-27 at 09:19 -0700, Jameson Graef Rollins wrote:
quoted
I run what I imagine is a fairly unusual disk setup on my laptop,
consisting of:

  ssd -> raid1 -> dm-crypt -> lvm -> ext4

I use the raid1 as a backup.  The raid1 operates normally in degraded
mode.  For backups I then hot-add a usb hdd, let the raid1 sync, and
then fail/remove the external hdd. 
Well, this is not expected to work.  Possibly the hot-addition of a disk
with different bio restrictions should be rejected.  But I'm not sure,
because it is safe to do that if there is no mounted filesystem or
stacking device on top of the RAID.
Hi, Ben.  Can you explain why this is not expected to work?  Which part
exactly is not expected to work and why?
Adding another type of disk controller (USB storage versus whatever the
SSD interface is) to a RAID that is already in use.
quoted
I would recommend using filesystem-level backup (e.g. dirvish or
backuppc).  Aside from this bug, if the SSD fails during a RAID resync
you will be left with an inconsistent and therefore useless 'backup'.
I appreciate your recommendation, but it doesn't really have anything to
do with this bug report.  Unless I am doing something that is
*expressly* not supposed to work, then it should work, and if it doesn't
then it's either a bug or a documentation failure (ie. if this setup is
not supposed to work then it should be clearly documented somewhere what
exactly the problem is).
The normal state of a RAID set is that all disks are online.  You have
deliberately turned this on its head; the normal state of your RAID set
is that one disk is missing.  This is such a basic principle that most
documentation won't mention it.
quoted
The block layer correctly returns an error after logging this message.
If it's due to a read operation, the error should be propagated up to
the application that tried to read.  If it's due to a write operation, I
would expect the error to result in the RAID becoming desynchronised.
In some cases it might be propagated to the application that tried to
write.
Can you say what is "correct" about the returned error?  That's what I'm
still not understanding.  Why is there an error and what is it coming
from?
The error is that you changed the I/O capabilities of the RAID while it
was already in use.  But what I was describing as 'correct' was that an
error code was returned, rather than the error condition only being
logged.  If the error condition is not properly propagated then it could
lead to data loss.

Ben.

-- 
Ben Hutchings
Once a job is fouled up, anything done to improve it makes it worse.

Attachments

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help