Thread (130 messages) 130 messages, 15 authors, 2013-04-17

Re: RAID performance

From: Adam Goryachev <hidden>
Date: 2013-02-07 12:49:26

On 07/02/13 22:07, Dave Cundiff wrote:
On Thu, Feb 7, 2013 at 5:19 AM, Adam Goryachev
[off-list ref] wrote:
quoted
On 07/02/13 20:07, Dave Cundiff wrote:
quoted
On Thu, Feb 7, 2013 at 1:48 AM, Adam Goryachev
[off-list ref] wrote:
Why would you plug thousands of dollars of SSD into an onboard
controller? It's probably running off a 1x PCIE shared with every
other onboard device. An LSI 8x 8 port HBA will run you a few
hundred(less than 1 SSD) and let you melt your northbridge. At least
on my Supermicro X8DTL boards I had to add active cooling to it or it
would overheat and crash at sustained IO. I can hit 2 - 2.5GB a second
doing large sequential IO with Samsung 840 Pros on a RAID10.
Because originally I was just using 4 x 2TB 7200 rpm disks in RAID10, I
upgraded to SSD to improve performance (which it did), but hadn't (yet)
upgraded the SATA controller because I didn't know if it would help.

I'm seeing conflicting information here (buy SATA card or not)...
Its not going to help your remote access any. From your configuration
it looks like you are limited to 4 gigabits. At least as long as your
NICs are not in the slot shared with the disks. If they are you might
get some contention.

http://download.intel.com/support/motherboards/server/sb/g13326004_s1200bt_tps_r2_0.pdf

See page 17 for a block diagram of your motherboard. You have a 4x DMI
connection that PCI slot 3, your disks, and every other onboard device
share. That should be about 1.2GB(10Gigabits) of bandwidth. Your SSDs
alone could saturate that if you performed a local operation. Get your
NIC's going at 4Gig and all of it a sudden you'll really want that
SATA card in slot 4 or 5.
OK, I'll have to check that the 4 x 1G ethernet are in slots 4 and 5
now, not using the onboard ethernet, and not in slot 3...

If I could get close to 4Gbps (ie, saturate the ethernet) then I think
I'd be more than happy... I don't see my SSD's running at 400MB/s though
anyway....
quoted
quoted
quoted
2) Move from a 5 disk RAID5 to a 8 disk RAID10, giving better data
protection (can lose up to four drives) and hopefully better performance
(main concern right now), and same capacity as current.
I've had strange issues with anything other than RAID1 or 10 with SSD.
Even with the high IO and IOP rates of SSDs the parity calcs and extra
writes still seem to penalize you greatly.
Maybe this is the single threaded nature of RAID5 (and RAID10) ?
I definitely see that. See below for a FIO run I just did on one of my RAID10s

md2 : active raid10 sdb3[1] sdf3[5] sde3[4] sdc3[2] sdd3[3] sda3[0]
      742343232 blocks super 1.2 32K chunks 2 near-copies [6/6] [UUUUUU]

seq-read: (g=0): rw=read, bs=64K-64K/64K-64K/64K-64K, ioengine=libaio,
iodepth=32
seq-write: (g=2): rw=write, bs=64K-64K/64K-64K/64K-64K,
ioengine=libaio, iodepth=32

Run status group 0 (all jobs):
   READ: io=4096.0MB, aggrb=2149.3MB/s, minb=2149.3MB/s,
maxb=2149.3MB/s, mint=1906msec, maxt=1906msec

Run status group 2 (all jobs):
  WRITE: io=4096.0MB, aggrb=1168.7MB/s, minb=1168.7MB/s,
maxb=1168.7MB/s, mint=3505msec, maxt=3505msec

These drives are pretty fresh and my writes are a whole gig less than
my read. Its not for lack of bandwidth either.
Can you please show your command line used, so I can try a similar test
and see a comparison?
quoted
quoted
Also if your kernel does not have md TRIM support you risk taking a
SEVERE performance hit on writes. Once you complete a full write pass
on your NAND the SSD controller will require extra time to complete a
write. if your IO is mostly small and random this can cause your NAND
to become fragmented. If the fragmentation becomes bad enough you'll
be lucky to get 1 spinning disk worth of write IO out of all 5
combined.
This was the reason I made the partition (for raid) smaller than the
disk, and left the rest un-partitioned. However, as you said, once I've
fully written enough data to fill the raw disk capacity, I still have a
problem. Is there some way to instruct the disk (overnight) to TRIM the
extra blank space, and do whatever it needs to tidy things up? Perhaps
this would help, at least first thing in the morning if it isn't enough
to get through the day. Potentially I could add a 6th SSD, reduce the
partition size across all of them, just so there is more blank space to
get through a full day worth of writes?
There was a script called mdtrim that would use hdparm to manually
send the proper TRIM commands to the drives. I didn't bother looking
for a link because it scares me to death and you probably shouldn't
use it. If it gets the math wrong random data will disappear from your
disks.
Doesn't sound good... would be nice to use smartctl or similar to ask
the drive "please tidy up now". The drive itself knows that the
unpartitioned space is available.
As for changing partition sizes you really have to know what kinds of
IO you're doing. If all you're doing is hammering these things with
tiny IOs 24x7 its gonna end up with terrible write IO. At least my
SSDs do. If you have a decent mix of small and large it may not
fragment as badly. I ran random 4k against mine for 2 days before it
got miserably slow. Reading will always be fine.
Well, if I can re-trim daily, and have enough clean space to work for 2
days, then I should never hit this problem.... Assuming it loses *that
much* performance....

Thanks,
Adam


-- 
Adam Goryachev
Website Managers
www.websitemanagers.com.au
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help