Re: mdadm raid1 read performance
From: Roberto Spadim <hidden>
Date: 2011-05-04 23:35:19
2011/5/4 Liam Kurmos [off-list ref]:
Thanks to all who replied on this. I somewhat naively assumed that having 2 disks with the same data would mean a similar read speed to raid0 should be the norm (and i think this is a very popular miss-conception). I was neglecting the seek time to skip alternate blocks which i guess must the flaw. In theory though if i was reading a larger file, couldn't one disk start reading at the beginning to a buffer and one start reading from half way ( assuming 2 disks) and hence get close to 2x single disk speed?
hummm..... maybe, it´s what LINEAR do, and depend how linux divide one
large read into small reads, and how program use fread(), with many
small freads, or with one big fread
check some magic....
1 disk blocks:
disk1: ABCDEFGH
raid0 (stripe) 2 disks
disk1: ACEG
disk2: BDFH
raid1 (no stripe) 2 disks
disk1: ABCDEFGH
disk2: ABCDEFGH
raid0 (linear) 2 disks
disk1: ABCD
disk2: EFGH
if you want to read ABCDEFGH the best speed will be raid0 (stripe),
you can read A+B, C+D, E+F, G+H with small disk/head movement
raid1 could help? maybe.... if you have 2 programs reading ABCDEFGH
and you don´t have cache/buffer, one program can use disk1, and
another disk2 that´s the best speed, or raid0 (linear) if one program
read ABCD and another EFGH, and after change program 1 EFGH and
program 2 ABCD
the problem here is:
1)read speed (more RPM = more MB/s),
2)access time (more acces time = more latency, acess time = RPM and
DISK (head move time) size 2,5" or 3,5" or 1,8"), some 'normal'
numbers:
7200rpm=8,3333333ms acess time
10000rpm=6ms acess time
15000rpm=4ms acesstime
ssd = 0.1ms acesstime (firmware: sata protocol + internal address
table + queue + others internal firmware tasks)
3)
for hard disk:
total time to read = access time (from current disk position and
current head position, to new head position and new disk position) +
read speed * number of bytes
for ssd:
total time to read = access time + internal information search (some
ssd have internal reallocation) + memory read time
stripe allow a small accesstime, since one disk read A, and is near to
C, while other disk read B and is near to D, with a sequencial read of
ABCD, you have 2 'reads' per driver, while with a linear you have 4
'reads'
as a separate question, what should be the theoretical performance of raid5? in my tests i read 1GB and throw away the data. dd if=/dev/md0 of=/dev/null bs=1M count=1000 With 4 fairly fast hdd's i get raid0: ~540MB/s raid10: 220MB/s raid5: ~165MB/s raid1: ~140MB/s (single disk speed) for 4 disks raid0 seems like suicide, but for my system drive the speed advantage is so great im tempted to try it anyway and try and use rsync to keep constant back up.
i don´t know many information about raid5, but i think it´s near raid0 linear or raid0 stripe algorithm, need some checks with others guys
cheers for you responses, Liam On Wed, May 4, 2011 at 8:42 AM, Roberto Spadim [off-list ref] wrote:quoted
hum... at user program we use: file=fopen(); var=fread(file,buffer_size);fclose(file); buffer_size is the problem since it can be very small (many reads), or very big (small memory problem, but very nice query to optimize at device block level) if we have a big buffer_size, we can split it across disks (ssd) if we have a small buffer_size, we can't split it (only if readahead is very big) problem: we need memory (cache/buffer) the problem... is readahead better for ssd? or a bigger 'buffer_size' at user program is better? or... a filesystem change of 'block' size to a bigger block size, with this don't matter if user use a small buffer_size at fread functions, filesystem will always read many information at device block layer, what's better? others ideas? i don't know how linux kernel handle a very big fread with memory for example: fread(file,1000000); // 1MB will linux split the 'single' fread in many reads at block layer? each read with 1 block size (512byte/4096byte)? 2011/5/4 Brad Campbell [off-list ref]:quoted
On 04/05/11 13:30, Drew wrote:quoted
It seemed logical to me that if two disks had the same data and we were reading an arbitrary amount of data, why couldn't we split the read across both disks? That way we get the benefits of pulling from multiple disks in the read case while accepting the penalty of a write being as slow as the slowest disk..I would have thought as you'd be skipping alternate "stripes" on each disk you minimise the benefit of a readahead buffer and get subjected to seek and rotational latency on both disks. Overall you're benefit would be slim to immeasurable. Now on SSD's I could see it providing some extra oomph as you suffer none of the mechanical latency penalties. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html-- Roberto Spadim Spadim Technology / SPAEmpresarial-- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
-- Roberto Spadim Spadim Technology / SPAEmpresarial -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html