Thread (6 messages) 6 messages, 3 authors, 2014-03-05

Re: RAID5 hard freeze

From: NeilBrown <hidden>
Date: 2014-02-25 02:58:09

On Tue, 25 Feb 2014 00:01:42 +0200 Denis Golovan [off-list ref]
wrote:
Hi all

I am struggling to diagnose a strange freeze of software RAID5 array.
My RAID5 consists of 4 Toshiba SATA drives and has ext4 filesystem on top of it.

It works fine unless I start several process writing intensively to it.
At first, it looks like the system is under high pressure, then the
system starts lagging a lot and a hard freeze always follows after
several minutes.

No errors in system log, nothing is emitted to console. Just hard
freeze with HDD light always on. I tried enabling kernel network
logging to another machine and again no information when hanging.
After reboot, my array starts reconstruction and finishes without
errors.

I tried disabling quotas and barriers for ext4.
After disabling barriers, it almost seemed to work, but after some
time the same hard freeze happens.

I tested the same hardware configuration under Linux v3.10, 3.11, 3.12
and now 3.13.5 (all x86 arch) behaves the same way. The same issue can
be reproduced easily.

So now I tested everything Google suggests on the matter.
Could you give a hint on how to debug this issue?
The most useful thing for debugging a hard freeze is the alt-sysrq-T output
when it is frozen.  typing that magic sequence should always produce some
output unless it is hard-frozen with interrupts disabled.

So make sure you can produce the output when the system is working properly
(to a log file file the network console would be ideal), then when it hangs,
produce the output again.
To probably need to have a text console rather than a graphic console for it
to work.


If it is hard-hanging with interrupts disabled, then it gets tricky.  I
thought there was some NMI-based lockup detector which would warn if that
happened, but I cannot find it just now.

NeilBrown

Attachments

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help