Thread (12 messages) 12 messages, 6 authors, 2011-12-31

Re: RAID5 alignment issues with 4K/AF drives (WD green ones)

From: Zdenek Kaspar <hidden>
Date: 2011-12-30 23:17:34

Dne 30.12.2011 22:04, Michele Codutti napsal(a):
Hi all, thanks for the tips I'll reply everyone in one aggregated message:
quoted
Just a thought, but do you have the "XP mode" jumper removed on all drives?
Yes.
quoted
Instead of doing a monster sequential write to find my disk speed, I
generally find it more useful to add conv=fdatasync to a dd so that
the dirty buffers are utilized as they are in most real-world working
environments, but I don't get a result until the test is on-disk.
Done, same results (40 MB/s)
quoted
quoted
quoted
My only suggestion would be to experiment with various partitioning,

Poster already said they're not partitioned.
Correct. using partitioning allows you to adjust the alignment, so for
example if the MD superblock at the front moves the start of the
exported MD device out of alignment with the base disks, you could
compensate for it by starting your partition on the correct offset.
Done. I've created one big partition using parted with "-a optimal".
The partition layout is (fdisk friendly output):
Disk /dev/sdc: 2000.4 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00077f06

   Device Boot      Start         End      Blocks   Id  System
/dev/sdc1            2048  3907028991  1953513472   fd  Linux raid autodetect
Redone the test with the "conv=fdatasync" option as above: same results.
quoted
My only suggestion would be to experiment with various partitioning,
starting the first partition at 2048s or various points to see if you
can find a placement that aligns the partitions properly. I'm sure
there's an explanation, but I'm not in the mood to put on my thinking
hat to figure it out at the moment. May also be worth using a
different superblock version, as 1.2 is 4k from the start of the
drives, which might be messing with alignment (although I would expect
it on all arrays), worth trying the .9 which goes to the end of the
device.
I've tried all the superblock versions 0, 0.9, 1, 1.1 and 1.2. Same results.
quoted
No, those drives generally DON'T report 4k to the OS, even though they
are. If they were, there'd be fewer problems. They lie and say 512b
sectors for compatibility.
Yes they are dirty liars. It's the same also for the EADS series not only for the EARS ones.
quoted
My recommendation would be to look into the stripe-cache settings and check
iostat -x 5 output. What is most likely happening is that when writing to
the raid5, it's reading some (to calculate parity most likely) and not just
writing. iostat will confirm if this is indeed the case.
Could you explain how I could look into the stripe-cache settings?
This is one of many similar outputs from iostat -x 5 from the initial rebuilding phase:
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00   13.29    0.00    0.00   86.71
Device: rrqm/s  wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda    6585.60    0.00 4439.20    0.00 44099.20     0.00    19.87     6.14  1.38    1.38    0.00  0.09 39.28
sdb    6280.40    0.00 4746.60    0.00 44108.00     0.00    18.59     5.20  1.10    1.10    0.00  0.07 35.04
sdc       0.00 9895.40    0.00 1120.80     0.00 44152.80    78.79    12.03 10.73    0.00   10.73  0.82 92.32
I also build a RAID6 (with one drive missing): same results.
quoted
There must be some misalignment somewhere :(
Yes, it's the same behavior.
quoted
Do all drives really report as 4K to the OS - physical_block_size, logical_block_size under
/sys/block/sdX/queue/ ??
No they lie about the block size as you can see also in the fdisk output above.
quoted
NB: how does it perform with partitions starting at sector 2048 (check
all disks with fdisk -lu /dev/sdX).
They perform the same.

Any other suggestion?

I almost forgot: I've also booted OpenSolaris and I've created a zfs pool (aligned with 4k sector) from the same three drives and they perform very well, individually and together. I know that I'm comparing apples and oranges but ... there must be a solution!--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
WTF is the jumper for then ? (on 512B drive)
Does it change somehow:
/sys/block/sdX/queue/physical_block_size
/sys/block/sdX/queue/logical_block_size
/sys/block/sdX/alignment_offset

If osol can handle it (enforcing 4k), it's good sign.. (you used
ashift=12 for the pool, right?)

Z.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help