Re: best base / worst case RAID 5,6 write speeds
From: Dallas Clement <hidden>
Date: 2015-12-11 23:30:26
On Fri, Dec 11, 2015 at 3:24 PM, Dallas Clement [off-list ref] wrote:
On Fri, Dec 11, 2015 at 1:34 PM, John Stoffel [off-list ref] wrote:quoted
quoted
quoted
quoted
quoted
quoted
"Dallas" == Dallas Clement [off-list ref] writes:Dallas> On Fri, Dec 11, 2015 at 10:32 AM, John Stoffel [off-list ref] wrote:quoted
quoted
quoted
quoted
quoted
quoted
quoted
"Dallas" == Dallas Clement [off-list ref] writes:Dallas> Hi Mark. I have three different controllers on this Dallas> motherboard. A Marvell 9485 controls 8 of the disks. And an Dallas> Intel Cougar Point controls the 4 remaining disks.quoted
quoted
What type of PCIe slots are the controllers in? And how fast are the controllers/drives? Are they SATA1/2/3 drives?quoted
quoted
If you're spinning in IO loops then it could be a driver issue.Dallas> It sure is looking like that. I will try to profile the Dallas> kernel threads today and maybe use blktrace as Phil Dallas> recommended to see what is going on there.quoted
quoted
what kernel aer you running?Dallas> This is pretty sad that 12 single threaded fio jobs can bring Dallas> this system to its knees.quoted
quoted
I think it might be better to lower the queue depth, you might be just blowing out the controller caches... hard to know.Dallas> Hi John.quoted
quoted
What type of PCIe slots are the controllers in? And how fast are the controllers/drives? Are they SATA1/2/3 drives?Dallas> The MV 9485 controller is attached to an Intel Sandy Bridge Dallas> via PCIe GEN2 x 8. This one controls 8 of the disks. The Dallas> Intel Cougar Point is connected to the Intel Sandy Bridge via Dallas> DMI bus. So that should all be nice and fast. Dallas> All of the drives are SATA III, however I do have two of the Dallas> drives connected to SATA II ports on the Cougar Point. These Dallas> two drives used to be connected to SATA III ports on a MV Dallas> 9125/9120 controller. But it had truly horrible write Dallas> performance. Moving to the SATA II ports on the Cougar Point Dallas> boosted the performance close to the same as the other drives. Dallas> The remaining 10 drives are all connected to SATA III ports.quoted
quoted
what kernel aer you running?Dallas> Right now, I'm using 3.10.69. But I have tried the 4.2 kernel Dallas> in Fedora 23 with similar results. Hmm... maybe if your feeling adventerous you could try v4.4-rc4 and see how it works. You don't want anything between 4.2.6 and that because of problems with blk req management. I'm hazy on the details.quoted
quoted
I think it might be better to lower the queue depth, you might be just blowing out the controller caches... hard to know.Dallas> Good idea. I'll trying lowering to see what effect. It might also make sense to try your tests starting with just 1 disk, and then adding one more disk, re-running the tests, then another disk, re-running the tests, etc. Try with one on the MV, then one on the Cougar, then one on MV and one on Cougar, etc. Try to see if you can spot where the performance falls off the cliff. Also, which disk scheduler are you using? Instead of CFQ, you might try deadline instead. As you can see, there's a TON of knobs to twiddle with, it's not a simple thing to do at times. Johnquoted
It might also make sense to try your tests starting with just 1 disk, and then adding one more disk, re-running the tests, then another disk, re-running the tests, etcquoted
Try to see if you can spot where the performance falls off the cliff.Okay, did this. Interestingly, things did not fall of the cliff until adding in the 12th disk. I started adding disks one at a time beginning with the Cougar Point. The %iowait jumped up right away with this guy also.quoted
Also, which disk scheduler are you using? Instead of CFQ, you might try deadline instead.I'm using deadline. I have definitely observed better performance with this vs cfq. At this point I think I need to probably use a tool like blktrace to get more visibility than what I have with ps and iostat.
I have one more observation. I tried varying the queue depth from 1, 4, 16, 32, 64, 128, 256. Surprisingly, all 12 disks are able to handle this load with queue depth <= 128. Each disk is at 100% utilization and writing 170-180 MB/s. Things start to fall apart with queue depth = 256 after adding in the 12th disk. The inflection point on load average seems to be around queue depth = 32. The load average for this 8 core system goes up to about 13 when I increase the queue depth to 64. So is my workload of 12 fio jobs writing sequential 2 MB blocks with direct I/O just too abusive? Seems so with high queue depth. I started this discussion because my RAID 5 and RAID 6 write performance is really bad. If my system is able to write to all 12 disks at 170 MB/s in JBOD mode, I am expecting that one fio job should be able to write at a speed of (N - 1) * X = 11 * 170 MB/s = 1870 MB/s. However, I am getting < 700 MB/s for queue depth = 32 and < 600 MB/s for queue depth = 256. I get similarly disappointing results for RAID 6 writes.