Re: best base / worst case RAID 5,6 write speeds

From: Dallas Clement <hidden>
Date: 2015-12-12 02:55:10

On Fri, Dec 11, 2015 at 6:38 PM, Phil Turmel [off-list ref] wrote:

On 12/11/2015 07:00 PM, Dallas Clement wrote:

quoted

So is my workload of 12 fio jobs writing sequential 2 MB blocks with
direct I/O just too abusive?  Seems so with high queue depth.

I don't think you are adjusting any hardware queue depth here.  The fio
man page is quite explicit that iodepth=N is ineffective for sequential
operations.  But you are using the libaio engine, so you are piling up
many *software* queued operations for the kernel to execute, not
operations in flight to the disks.  From the histograms in your results,
the vast majority of ops are completing at depth=4.  Further queuing is
just adding kernel overhead.

The queuing differences from one kernel to another is a driver and
hardware property, not an application property.

quoted

I started this discussion because my RAID 5 and RAID 6 write
performance is really bad.  If my system is able to write to all 12
disks at 170 MB/s in JBOD mode, I am expecting that one fio job should
be able to write at a speed of (N - 1) * X = 11 * 170 MB/s = 1870
MB/s.  However, I am getting < 700 MB/s for queue depth = 32 and < 600
MB/s for queue depth = 256.  I get similarly disappointing results for
RAID 6 writes.

That's why I suggested blktrace.  Collect a trace while a single dd is
writing to your raw array device.  Compare the large writes submitted to
the md device against the broken down writes submitted to the member
devices.

Compare the patterns and sizes from older kernels against newer kernels,
possibly varying which controllers and data paths are involved.

Phil

Hi Phil,

I don't think you are adjusting any hardware queue depth here.

Right, that was my understanding as well.  The fio iodepth setting
just controls how many I/Os can be in flight from the application
perspective.  I have not modified the hardware queue depth on my disks
at all yet.  Was saving that for later.

 The fio man page is quite explicit that iodepth=N is ineffective for sequential
operations.  But you are using the libaio engine, so you are piling up
many *software* queued operations for the kernel to execute, not
operations in flight to the disks.

Right.  I understand the fio iodepth is different than the hardware
queue depth.  But the fio man page seems to only mention limitation on
synchronous operations which mine are not. I'm using direct=1 and
sync=0.

I guess what I would really like to know is how I can achieve at or
near 100% utilization on the raid device and its member disks with
fio.  Do I need to increase /sys/block/sd*/device/queue_depth and
/sys/block/sd*/queue/nr_requests to get more utilization?

That's why I suggested blktrace.  Collect a trace while a single dd is
writing to your raw array device.  Compare the large writes submitted to
the md device against the broken down writes submitted to the member
devices.

Sounds good.  Will do.  What signs of trouble should I be looking for?

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help