Thread (12 messages) 12 messages, 6 authors, 2011-12-31

Re: RAID5 alignment issues with 4K/AF drives (WD green ones)

From: Marcus Sorensen <hidden>
Date: 2011-12-30 06:09:07

I think we need more info on his test. If he's running the dd until he
exhausts his writeback to see what the disk speed is, then yes, he'll
run into having to read stripes to calculate parity since he'll be
forced to write 4k blocks synchronously (prior to kernel 3.1, where
his thread will still get to use dirty memory but just be forced to
sleep if the disk can't keep up). I have seen bumping the stripe cache
help significantly in these cases, and in the real world where you're
not writing large full-stripe files.

Instead of doing a monster sequential write to find my disk speed, I
generally find it more useful to add conv=fdatasync to a dd so that
the dirty buffers are utilized as they are in most real-world working
environments, but I don't get a result until the test is on-disk.

On Thu, Dec 29, 2011 at 10:45 PM, Marcus Sorensen [off-list ref] wrote:
On Thu, Dec 29, 2011 at 9:52 PM, Mikael Abrahamsson [off-list ref] wrote:
quoted
On Thu, 29 Dec 2011, Marcus Sorensen wrote:
quoted
My only suggestion would be to experiment with various partitioning,

Poster already said they're not partitioned.
Correct. using partitioning allows you to adjust the alignment, so for
example if the MD superblock at the front moves the start of the
exported MD device out of alignment with the base disks, you could
compensate for it by starting your partition on the correct offset.

quoted
quoted
On Thu, Dec 29, 2011 at 7:00 PM, Zdenek Kaspar [off-list ref]
wrote:
quoted
Dne 30.12.2011 0:28, Michele Codutti napsal(a):
quoted
The drives are not partitioned. I'm using the default chunk size (512K)
and the default metadata superblock version (1.2).

My recommendation would be to look into the stripe-cache settings and check
iostat -x 5 output. What is most likely happening is that when writing to
the raid5, it's reading some (to calculate parity most likely) and not just
writing. iostat will confirm if this is indeed the case.

Also, using raid5 for 2TB drives or larger is not recommended, use RAID6
<http://www.zdnet.com/blog/storage/why-raid-5-stops-working-in-2009/162>.
If he's writing full stripes he doesn't need to calculate parity by
reading. I'm not sure how the MD layer determines this though, unless
he's adding a sync or o_direct flag to his test he should be writing
full stripes regardless of the blocksize he sets.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help