Thread (3 messages) 3 messages, 2 authors, 2006-05-17

Re: raid5 hang on get_active_stripe

From: dean gaudet <hidden>
Date: 2006-05-17 18:41:28

On Thu, 11 May 2006, dean gaudet wrote:
On Tue, 14 Mar 2006, Neil Brown wrote:
quoted
On Monday March 13, patrik@ucolick.org wrote:
quoted
I just experienced some kind of lockup accessing my 8-drive raid5
(2.6.16-rc4-mm2). The system has been up for 16 days running fine, but
now processes that try to read the md device hang. ps tells me they are
all sleeping in get_active_stripe. There is nothing in the syslog, and I
can read from the individual drives fine with dd. mdadm says the state
is "active".
...
i seem to be running into this as well... it has happenned several times 
in the past three weeks.  i attached the kernel log output...
it happenned again...  same system as before...

quoted
You could try increasing the size of the stripe cache
  echo 512 > /sys/block/mdX/md/stripe_cache_size
(choose and appropriate 'X').
yeah that got things going again -- it took a minute or so maybe, i
wasn't paying attention as to how fast things cleared up.
i tried 768 this time and it wasn't enough... 1024 did it again...
quoted
Maybe check the content of
         /sys/block/mdX/md/stripe_cache_active
as well.
next time i'll check this before i increase stripe_cache_size... it's
0 now, but the raid5 is working again...
here's a sequence of things i did... not sure if it helps:

# cat /sys/block/md4/md/stripe_cache_active
435
# cat /sys/block/md4/md/stripe_cache_size
512
# echo 768 >/sys/block/md4/md/stripe_cache_size
# cat /sys/block/md4/md/stripe_cache_active
752
# cat /sys/block/md4/md/stripe_cache_active
752
# cat /sys/block/md4/md/stripe_cache_active
752
# cat /sys/block/md4/md/stripe_cache_active
752
# cat /sys/block/md4/md/stripe_cache_active
752
# cat /sys/block/md4/md/stripe_cache_active
752
# cat /sys/block/md4/md/stripe_cache_active
752
# echo 1024 >/sys/block/md4/md/stripe_cache_size
# cat /sys/block/md4/md/stripe_cache_active
927
# cat /sys/block/md4/md/stripe_cache_active
151
# cat /sys/block/md4/md/stripe_cache_active
66
# cat /sys/block/md4/md/stripe_cache_active
2
# cat /sys/block/md4/md/stripe_cache_active
1
# cat /sys/block/md4/md/stripe_cache_active
0
# cat /sys/block/md4/md/stripe_cache_active
3

and it's OK again... except i'm going to lower the stripe_cache_size to
256 again because i'm not sure i want to keep having to double it each
freeze :)

let me know if you want the task dump output from this one too.

-dean
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help