Thread (6 messages) 6 messages, 2 authors, 2010-08-11

Re: sdc1 does not have a valid v0.90 superblock, not importing!

From: Jon Hardcastle <hidden>
Date: 2010-08-11 15:30:31

--- On Wed, 11/8/10, Neil Brown <neilb@suse.de> wrote:
From: Neil Brown <redacted>
Subject: Re:  sdc1 does not have a valid v0.90 superblock, not importing!
To: Jon@eHardcastle.com
Cc: jd_hardcastle@yahoo.com, linux-raid@vger.kernel.org
Date: Wednesday, 11 August, 2010, 12:34
On Wed, 11 Aug 2010 04:19:07 -0700
(PDT)
Jon Hardcastle [off-list ref]
wrote:
quoted
--- On Wed, 11/8/10, Neil Brown <neilb@suse.de>
wrote:
quoted
quoted
From: Neil Brown <redacted>
Subject: Re:  sdc1 does not have a valid
v0.90 superblock, not importing!
quoted
quoted
To: Jon@eHardcastle.com
Cc: jd_hardcastle@yahoo.com,
linux-raid@vger.kernel.org
quoted
quoted
Date: Wednesday, 11 August, 2010, 12:06
On Wed, 11 Aug 2010 02:55:44 -0700
(PDT)
Jon Hardcastle [off-list ref]
wrote:
quoted
(my first attempt appears to have been
bounced as the
quoted
quoted
spam checker thought it had HTML in it?!)

odd... came through ok for me the first time.
quoted
Help!

Long story short - I was watching a movie
off my RAID6
quoted
quoted
array. Got a smart error warning
quoted
Aug 10 22:00:07 mangalore kernel: raid5:
cannot start
quoted
quoted
dirty degraded array for md4

This is the current problem.  The array is dirty
and
quoted
quoted
degraded so there could
theoretically be undetectable corruption. 
Chance is
quoted
quoted
quite low but it is
there so md won't start with out you
acknowledging the risk
quoted
quoted
by giving the
--force flag to mdadm --assemble.
Only do that if you are confident that your
hardware is
quoted
quoted
working correctly.
Well I am reasonable sure the controller came adrift
the first time.. when i reseated it i stopped getting 100's
of errors.. and it has survived 1.5 badblocks checks. It is
being held in place by one of those bars you press down
(does all the expansion cards in 1 go) except i dont think
it is very good. I will screw it down.
quoted
quoted
quoted
It appears sdc has an invalid superblock?

This is the 'examine' from sdc1 (note the
checksum)
quoted
quoted
quoted
/dev/sdc1:
.....
quoted
      Checksum : b335b4e3 -
expected b735b4e3

Single bit error.  That isn't good as it means
some
quoted
quoted
bit of memory or some bit
on some bus somewhere cannot be trusted.
It could be a transient thing and will never
happen
quoted
quoted
again.  Or maybe not.
Given the smart errors and the fact that you have
had
quoted
quoted
problems with the drive
before it seem very likely that the problem is in
that
quoted
quoted
drive.  I suggest
unplugging it and leaving it unplugged.  Some
memory
quoted
quoted
buffer in the drive is
probably marginal.  I don't think they use ECC
memory.
Could this be a result of me forcing a power off when
the drive was causing problems?

Probably not.  Forcing a power off may well have left
the array 'dirty' so
that it wouldn't assemble, but is fairly unlikely to
corrupt data within a
block.
quoted
What are the dangers to removing it, zeroing the
superblock and readding? is it MORE dangerous than leaving a
raid 6 degraded for a few days?

In general, I would say the chance of a known-bad drive
causing problems is
greater than the chance of a fewer known-good drives
causing problems.
But then you seem to think it isn't the drive, it was the
controller and that
is fixed...

This is really about your level of trust in the hardware.
If you trust sdc as much as the others, include it in the
array.
If you don't, then don't.

NeilBrown


quoted
quoted
quoted
Anyways... I am ASSUMING mdadm has not
assembled the
quoted
quoted
array to be on the safe side? i have not done
anything.. no
quoted
quoted
force... no assume clean.. I wanted to be sure?

You assume correctly.
quoted
Should i remove sdc1 from the array? It
should then
quoted
quoted
assemble? I have 2 spare drives that I am getting
around to
quoted
quoted
using to replace this drive and the other 500GB..
so should
quoted
quoted
I remove sdc1... and try and re-add or just put
the new
quoted
quoted
drive in?
quoted
atm I have 'stop'ped the array and got
badblocks
quoted
quoted
running....
quoted
Remove sdc and assemble the array with --force,
and get a
quoted
quoted
new device to
replace /dev/sdc as soon as possible.
Thanks Neil - I panic'd as previously it has mounted
the array in a degraded state... but previously the drive
has disappeared completely... whereas in this case it is
present... but wrong!
quoted
quoted
NeilBrown
--
To unsubscribe from this list: send the line
"unsubscribe
quoted
quoted
linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

      
For the benefit of those that follow! I assemble the array by specifying exactly the drives I wanted in it.. once assembled i could confidently zero-block the troublesome drive... without it rearing its ugly head again!

I now have a 1 drive down - degraded raid 6 array.


      
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help