Re: best base / worst case RAID 5,6 write speeds

From: Dallas Clement <hidden>
Date: 2015-12-11 23:30:26

On Fri, Dec 11, 2015 at 3:24 PM, Dallas Clement
[off-list ref] wrote:

On Fri, Dec 11, 2015 at 1:34 PM, John Stoffel [off-list ref] wrote:

quoted

quoted
quoted
quoted
quoted
"Dallas" == Dallas Clement [off-list ref] writes:

Dallas> On Fri, Dec 11, 2015 at 10:32 AM, John Stoffel [off-list ref] wrote:

quoted

quoted
quoted
quoted
quoted
quoted
quoted
"Dallas" == Dallas Clement [off-list ref] writes:

Dallas> Hi Mark.  I have three different controllers on this
Dallas> motherboard.  A Marvell 9485 controls 8 of the disks.  And an
Dallas> Intel Cougar Point controls the 4 remaining disks.

quoted

What type of PCIe slots are the controllers in?  And how fast are the
controllers/drives?  Are they SATA1/2/3 drives?

quoted

If you're spinning in IO loops then it could be a driver issue.

Dallas> It sure is looking like that.  I will try to profile the
Dallas> kernel threads today and maybe use blktrace as Phil
Dallas> recommended to see what is going on there.

quoted

what kernel aer you running?

Dallas> This is pretty sad that 12 single threaded fio jobs can bring
Dallas> this system to its knees.

quoted

I think it might be better to lower the queue depth, you might be just
blowing out the controller caches...  hard to know.

Dallas> Hi John.

quoted

What type of PCIe slots are the controllers in?  And how fast are the
controllers/drives?  Are they SATA1/2/3 drives?

Dallas> The MV 9485 controller is attached to an Intel Sandy Bridge
Dallas> via PCIe GEN2 x 8.  This one controls 8 of the disks.  The
Dallas> Intel Cougar Point is connected to the Intel Sandy Bridge via
Dallas> DMI bus.

So that should all be nice and fast.

Dallas> All of the drives are SATA III, however I do have two of the
Dallas> drives connected to SATA II ports on the Cougar Point.  These
Dallas> two drives used to be connected to SATA III ports on a MV
Dallas> 9125/9120 controller.  But it had truly horrible write
Dallas> performance.  Moving to the SATA II ports on the Cougar Point
Dallas> boosted the performance close to the same as the other drives.
Dallas> The remaining 10 drives are all connected to SATA III ports.

quoted

what kernel aer you running?

Dallas> Right now, I'm using 3.10.69.  But I have tried the 4.2 kernel
Dallas> in Fedora 23 with similar results.

Hmm... maybe if your feeling adventerous you could try v4.4-rc4 and
see how it works.  You don't want anything between 4.2.6 and that
because of problems with blk req management.  I'm hazy on the details.

quoted

I think it might be better to lower the queue depth, you might be just
blowing out the controller caches...  hard to know.

Dallas> Good idea.  I'll trying lowering to see what effect.

It might also make sense to try your tests starting with just 1 disk,
and then adding one more disk, re-running the tests, then another
disk, re-running the tests, etc.

Try with one on the MV, then one on the Cougar, then one on MV and one
on Cougar, etc.

Try to see if you can spot where the performance falls off the cliff.

Also, which disk scheduler are you using?  Instead of CFQ, you might
try deadline instead.

As you can see, there's a TON of knobs to twiddle with, it's not a
simple thing to do at times.

John

quoted

It might also make sense to try your tests starting with just 1 disk,
and then adding one more disk, re-running the tests, then another
disk, re-running the tests, etc

quoted

Try to see if you can spot where the performance falls off the cliff.

Okay, did this.  Interestingly, things did not fall of the cliff until
adding in the 12th disk.  I started adding disks one at a time
beginning with the Cougar Point.  The %iowait jumped up right away
with this guy also.

quoted

Also, which disk scheduler are you using?  Instead of CFQ, you might
try deadline instead.

I'm using deadline.  I have definitely observed better performance
with this vs cfq.

At this point I think I need to probably use a tool like blktrace to
get more visibility than what I have with ps and iostat.

I have one more observation.  I tried varying the queue depth from 1,
4, 16, 32, 64, 128, 256.  Surprisingly, all 12 disks are able to
handle this load with queue depth <= 128.  Each disk is at 100%
utilization and writing 170-180 MB/s.  Things start to fall apart with
queue depth = 256 after adding in the 12th disk.  The inflection point
on load average seems to be around queue depth = 32.  The load average
for this 8 core system goes up to about 13 when I increase the queue
depth to 64.

So is my workload of 12 fio jobs writing sequential 2 MB blocks with
direct I/O just too abusive?  Seems so with high queue depth.

I started this discussion because my RAID 5 and RAID 6 write
performance is really bad.  If my system is able to write to all 12
disks at 170 MB/s in JBOD mode, I am expecting that one fio job should
be able to write at a speed of (N - 1) * X = 11 * 170 MB/s = 1870
MB/s.  However, I am getting < 700 MB/s for queue depth = 32 and < 600
MB/s for queue depth = 256.  I get similarly disappointing results for
RAID 6 writes.

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help