Thread (36 messages) 36 messages, 14 authors, 2011-05-07

Re: mdadm raid1 read performance

From: Roberto Spadim <hidden>
Date: 2011-05-04 23:35:19

2011/5/4 Liam Kurmos [off-list ref]:
Thanks to all who replied on this.

I somewhat naively assumed that having 2 disks with the same data
would mean a similar read speed to raid0 should be the norm (and i
think this is a very popular miss-conception).
I was neglecting the seek time to skip alternate blocks which i guess
must the flaw.

In theory though if i was reading a larger file, couldn't one disk
start reading at the beginning to a buffer and one start reading from
half way ( assuming 2 disks) and hence get close to 2x single disk
speed?
hummm..... maybe, it´s what LINEAR do, and depend how linux divide one
large read into small reads, and how program use fread(), with many
small freads, or with one big fread
check some magic....

1 disk blocks:
disk1: ABCDEFGH

raid0 (stripe) 2 disks
disk1: ACEG
disk2: BDFH

raid1 (no stripe) 2 disks
disk1: ABCDEFGH
disk2: ABCDEFGH

raid0 (linear) 2 disks
disk1: ABCD
disk2: EFGH

if you want to read ABCDEFGH the best speed will be raid0 (stripe),
you can read A+B, C+D, E+F, G+H with small disk/head movement
raid1 could help? maybe.... if you have 2 programs reading ABCDEFGH
and you don´t have cache/buffer, one program can use disk1, and
another disk2 that´s the best speed, or raid0 (linear) if one program
read ABCD and another EFGH, and after change program 1 EFGH and
program 2 ABCD

the problem here is:
1)read speed (more RPM = more MB/s),
2)access time (more acces time = more latency, acess time = RPM and
DISK (head move time) size 2,5" or 3,5" or 1,8"), some 'normal'
numbers:
    7200rpm=8,3333333ms acess time
    10000rpm=6ms acess time
    15000rpm=4ms acesstime
    ssd = 0.1ms acesstime (firmware: sata protocol + internal address
table + queue + others internal firmware tasks)
3)
for hard disk:
total time to read = access time (from current disk position and
current head position, to new head position and new disk position) +
read speed * number of bytes
for ssd:
total time to read = access time + internal information search (some
ssd have internal reallocation) + memory read time

stripe allow a small accesstime, since one disk read A, and is near to
C, while other disk read B and is near to D, with a sequencial read of
ABCD, you have 2 'reads' per driver, while with a linear you have 4
'reads'


as a separate question, what should be the theoretical performance of raid5?

in my tests i read 1GB and throw away the data.
dd if=/dev/md0 of=/dev/null bs=1M count=1000

With 4 fairly fast hdd's i get

raid0: ~540MB/s
raid10: 220MB/s
raid5: ~165MB/s
raid1: ~140MB/s  (single disk speed)

for 4 disks raid0 seems like suicide, but for my system drive the
speed advantage is so great im tempted to try it anyway and try and
use rsync to keep constant back up.
i don´t know many information about raid5, but i think it´s near raid0
linear or raid0 stripe algorithm, need some checks with others guys
cheers for you responses,

Liam



On Wed, May 4, 2011 at 8:42 AM, Roberto Spadim [off-list ref] wrote:
quoted
hum...
at user program we use:
file=fopen(); var=fread(file,buffer_size);fclose(file);

buffer_size is the problem since it can be very small (many reads), or
very big (small memory problem, but very nice query to optimize at
device block level)
if we have a big buffer_size, we can split it across disks (ssd)
if we have a small buffer_size, we can't split it (only if readahead
is very big)
problem: we need memory (cache/buffer)

the problem... is readahead better for ssd? or a bigger 'buffer_size'
at user program is better?
or... a filesystem change of 'block' size to a bigger block size, with
this don't matter if user use a small buffer_size at fread functions,
filesystem will always read many information at device block layer,
what's better? others ideas?

i don't know how linux kernel handle a very big fread with memory
for example:
fread(file,1000000); // 1MB
will linux split the 'single' fread in many reads at block layer? each
read with 1 block size (512byte/4096byte)?

2011/5/4 Brad Campbell [off-list ref]:
quoted
On 04/05/11 13:30, Drew wrote:
quoted
It seemed logical to me that if two disks had the same data and we
were reading an arbitrary amount of data, why couldn't we split the
read across both disks? That way we get the benefits of pulling from
multiple disks in the read case while accepting the penalty of a write
being as slow as the slowest disk..
I would have thought as you'd be skipping alternate "stripes" on each disk
you minimise the benefit of a readahead buffer and get subjected to seek and
rotational latency on both disks. Overall you're benefit would be slim to
immeasurable. Now on SSD's I could see it providing some extra oomph as you
suffer none of the mechanical latency penalties.

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


-- 
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help