Thread (13 messages) 13 messages, 7 authors, 2009-02-20

Re: sun x4500 soft lockup during raid creation

From: Bill Davidsen <hidden>
Date: 2009-01-28 22:31:48

Vladimir Ivashchenko wrote:
Hi,

We've got these new Sun X4500 servers. The system I'm playing with now
has 48 x 250 GB SATA HDDs.

Right now I'm creating two RAID6 arrays, 24 and 22 drives each:

mdadm --verbose --create /dev/md3 --level=6
--raid-devices=24 /dev/sda /dev/sdaa /dev/sdab /dev/sdad /dev/sdae /dev/sdaf /dev/sdag /dev/sdah /dev/sdai /dev/sdaj /dev/sdak /dev/sdal /dev/sdam /dev/sdan /dev/sdao /dev/sdap /dev/sdaq /dev/sdar /dev/sdas /dev/sdat /dev/sdau /dev/sdav /dev/sdb /dev/sdc

mdadm --verbose --create /dev/md4 --level=6
--raid-devices=22 /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm /dev/sdn /dev/sdo /dev/sdp /dev/sdq /dev/sdr /dev/sds /dev/sdt /dev/sdu /dev/sdv /dev/sdw /dev/sdx /dev/sdz

mdadm --detail is reporting that everything is going smoothly, however
my /var/log/messages is full of "BUG: soft lockup - CPU#X stuck for
10s!" errors appearing every 1-3 minutes. 

CentOS 5.2, 2.6.18-92.1.22.el5PAE, sata_mv. Two dual-core Opterons @ 2.8
Ghz, 16 GB RAM.

The system does not crash and otherwise seems to be healthy. Arrays are
still under construction and I don't know if they will actually work
yet.

What I noticed is that at first it was complaining about lockups on md3
process, but once I started creating md4, complaints were exclusively
for md4 process only.

Any stability assurances or workarounds are highly appreciated. :)
  
Recently comments about soft lockups in md init have popped up on 
several lists, and the consensus seems to be that some of the internal 
operations are keeping one or more CPUs waiting, but that's not a 
failure. I'm guessing that a more recent kernel might not do this, but 
it probably doesn't indicate a functional problem.

My read on a newer kernel is this:
- you went with CentOS instead of Fedora, you got stable instead of 
cutting edge
- CentOS 5.3 is coming out soon, RHEL 5.3 just came out
- it's not a functional problem

I'm planning to go to CentOS 5.3 on some machines, and I run Fedora on 
the rest. I don't see any joy between "most recent" and "most stable" on 
my systems. I would ignore the warning unless it happens during normal 
operation.

-- 
Bill Davidsen [off-list ref]
  "Woe unto the statesman who makes war without a reason that will still
  be valid when the war is over..." Otto von Bismark 

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help