Thread (130 messages) 130 messages, 15 authors, 2013-04-17

Re: RAID performance - new kernel results

From: Charles Polisher <hidden>
Date: 2013-03-10 15:35:18

On Mar 02, 2013 Adam Goryachev wrote:
On 24/02/13 02:57, John Stoffel wrote:
quoted
Can I please ask you to sit down and write a paper for USENIX on this
whole issue and how you resolved it?  You and Stan have done a great
job here documenting and discussing the problems, troubleshooting
methods and eventual solution(s) to the problem.  

It would be wonderful to have some diagrams to go with all this
discussion, showing the original network setup, iSCSI disk setup,
etc.  Then how to updated and changed thing to find bottlenecks. 

The interesting thing is the complete slowdown when using LVM
snapshots, which points to major possibilities for performance
improvements there.  But those improvements will be hard to do without
being able to run on real hardware, which is expensive for people to
have at home.  

I've been following this discussion from day one and really enjoying
it and I've learned quite a bit about iSCSI, networking and some of
the RAID issues.  I too run Debian stable on my home NFS/VM/mail/mysql
server and I've been getting frustrated by how far back it is, even
with backports.  I got burned in the past by testing, which is why I
stay on stable, but now I'm feeling like I'm getting burned on stable
too.  *grin*  It's a balancing act for sure!
Hi Adam, John, and Stan,

I too have been poring over this thread for weeks while building and
testing arrays in my lab, trying techniques you've been tossing
around, diagramming hardware & software, and generating plots of
the results. It's quite interesting work though friends are
asking pointed questions about where I've been. 

Last night's episode was tweaking the IO queue scheduler -- with
a raid0-on-raid5x2 I saw a 40% boost in IOPS for 80/20 mix of
random read/write (noop vs cfq).
I've never writen anything like that, but I think I could write a book
on this. I keep thinking I should get a blog and put stuff like this on
there, but there is always something else to do, and I'm not the sort of
person to write in my diary every day :)

I've already written up a sort of non-technical summary for the client
(about 5 pages), and just sent a non-detailed technical summary to the
list. Once everything is completed and settled, I can try and combine
those two, maybe throw in a bunch of extra details (command lines,
config files, etc), and see where it ends up. I suppose you are
volunteering as editor <G>
I can assist with testbeds, scripts, and visualizations that
support this process. I also have some editing skills. My
personal goal for this year (and maybe next) is to build an open
source tool that takes a system description, projects figures of
merit (price, performance, reliability) for specified workloads,
and scripts the setup, benching, data collection, and
visualization tasks. It seems there could be a lot of overlap
between my project and what is needed to put together an article.
Contact me if you'd like to explore working together.

Lastly, Adam: If MS Active Directory 2003 has any large group
objects (> 500 members), there can be large peaks in replication
traffic when group memberships change. There are other scenarios
for AD 2003 high-traffic issues. You could try using MS's
typeperf command line utility or their performance monitor GUI
to check the "DRA" inbound and outbound traffic during periods
of high disk/net activity. Also from experience you might check
if high CPU is related to anti-virus software that hasn't been
fenced out from checking the DIT.

Best regards,
-- 
Charles Polisher

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help