Re: best base / worst case RAID 5,6 write speeds
From: Dallas Clement <hidden>
Date: 2015-12-12 02:55:10
On Fri, Dec 11, 2015 at 6:38 PM, Phil Turmel [off-list ref] wrote:
On 12/11/2015 07:00 PM, Dallas Clement wrote:quoted
quoted
So is my workload of 12 fio jobs writing sequential 2 MB blocks with direct I/O just too abusive? Seems so with high queue depth.I don't think you are adjusting any hardware queue depth here. The fio man page is quite explicit that iodepth=N is ineffective for sequential operations. But you are using the libaio engine, so you are piling up many *software* queued operations for the kernel to execute, not operations in flight to the disks. From the histograms in your results, the vast majority of ops are completing at depth=4. Further queuing is just adding kernel overhead. The queuing differences from one kernel to another is a driver and hardware property, not an application property.quoted
quoted
I started this discussion because my RAID 5 and RAID 6 write performance is really bad. If my system is able to write to all 12 disks at 170 MB/s in JBOD mode, I am expecting that one fio job should be able to write at a speed of (N - 1) * X = 11 * 170 MB/s = 1870 MB/s. However, I am getting < 700 MB/s for queue depth = 32 and < 600 MB/s for queue depth = 256. I get similarly disappointing results for RAID 6 writes.That's why I suggested blktrace. Collect a trace while a single dd is writing to your raw array device. Compare the large writes submitted to the md device against the broken down writes submitted to the member devices. Compare the patterns and sizes from older kernels against newer kernels, possibly varying which controllers and data paths are involved. Phil
Hi Phil,
I don't think you are adjusting any hardware queue depth here.
Right, that was my understanding as well. The fio iodepth setting just controls how many I/Os can be in flight from the application perspective. I have not modified the hardware queue depth on my disks at all yet. Was saving that for later.
The fio man page is quite explicit that iodepth=N is ineffective for sequential operations. But you are using the libaio engine, so you are piling up many *software* queued operations for the kernel to execute, not operations in flight to the disks.
Right. I understand the fio iodepth is different than the hardware queue depth. But the fio man page seems to only mention limitation on synchronous operations which mine are not. I'm using direct=1 and sync=0. I guess what I would really like to know is how I can achieve at or near 100% utilization on the raid device and its member disks with fio. Do I need to increase /sys/block/sd*/device/queue_depth and /sys/block/sd*/queue/nr_requests to get more utilization?
That's why I suggested blktrace. Collect a trace while a single dd is writing to your raw array device. Compare the large writes submitted to the md device against the broken down writes submitted to the member devices.
Sounds good. Will do. What signs of trouble should I be looking for?