Thread (1 message) 1 message, 1 author, 2012-08-17

Re: mdraid6 problem post 3.5.0

From: NeilBrown <hidden>
Date: 2012-08-17 22:59:07
Also in: lkml

On Fri, 17 Aug 2012 18:30:11 -0400 John Drescher [off-list ref] wrote:
For the last few weeks I have been doing some reliability testing on a
 mdraid6 array. One of my test was to physically hot remove a raid
member disk. This worked flawlessly with gentoo-sources-3.5.0 for the
5 or so times I tried it with my 12 disk + 1 spare mdraid6 array.
After pulling a disk a few seconds later the array automatically
rebuilds with a spare and after finishing all data checks out via
btrfs a scrub. However trying this with gentoo-sources-3.5.2 or the
latest kernel.org git sources the machine does not start the rebuild
and any access to /proc/mdstat or and disk access that is not in cache
for that array just leads to an a long (possibly infinite) wait
eventually forcing me to have to use the reset button when the sysrq
key combinations fail to shut down the machine. I do see some kernel
debug message in the console alt-ctrl-f12 but I was unable to save
that to copy.

Is this a known problem? If not it may be possible that I could bisect
this next week to the patch that causes this behavior.
Thanks for the report.

The problem is not known to me..  There are no changes to raid6 between 3.5.0
and 3.5.2, so unless gentoo broke something (unlikely) this is very strange.

A digital-photo of the debug messages might be useful if you can catch that.
Setting up a network console to capture messages isn't too hard if you have
another machine with a wired network connection.
See Documentation/networking/netconsole.txt
If you can set that us, then alt-sysrq-T might provide useful info.


NeilBrown

Attachments

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help