Re: RAID5 alignment issues with 4K/AF drives (WD green ones)
From: Zdenek Kaspar <hidden>
Date: 2011-12-30 23:17:34
Dne 30.12.2011 22:04, Michele Codutti napsal(a):
Hi all, thanks for the tips I'll reply everyone in one aggregated message:quoted
Just a thought, but do you have the "XP mode" jumper removed on all drives?Yes.quoted
Instead of doing a monster sequential write to find my disk speed, I generally find it more useful to add conv=fdatasync to a dd so that the dirty buffers are utilized as they are in most real-world working environments, but I don't get a result until the test is on-disk.Done, same results (40 MB/s)quoted
quoted
quoted
My only suggestion would be to experiment with various partitioning,Poster already said they're not partitioned.Correct. using partitioning allows you to adjust the alignment, so for example if the MD superblock at the front moves the start of the exported MD device out of alignment with the base disks, you could compensate for it by starting your partition on the correct offset.Done. I've created one big partition using parted with "-a optimal". The partition layout is (fdisk friendly output): Disk /dev/sdc: 2000.4 GB, 2000398934016 bytes 255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x00077f06 Device Boot Start End Blocks Id System /dev/sdc1 2048 3907028991 1953513472 fd Linux raid autodetect Redone the test with the "conv=fdatasync" option as above: same results.quoted
My only suggestion would be to experiment with various partitioning, starting the first partition at 2048s or various points to see if you can find a placement that aligns the partitions properly. I'm sure there's an explanation, but I'm not in the mood to put on my thinking hat to figure it out at the moment. May also be worth using a different superblock version, as 1.2 is 4k from the start of the drives, which might be messing with alignment (although I would expect it on all arrays), worth trying the .9 which goes to the end of the device.I've tried all the superblock versions 0, 0.9, 1, 1.1 and 1.2. Same results.quoted
No, those drives generally DON'T report 4k to the OS, even though they are. If they were, there'd be fewer problems. They lie and say 512b sectors for compatibility.Yes they are dirty liars. It's the same also for the EADS series not only for the EARS ones.quoted
My recommendation would be to look into the stripe-cache settings and check iostat -x 5 output. What is most likely happening is that when writing to the raid5, it's reading some (to calculate parity most likely) and not just writing. iostat will confirm if this is indeed the case.Could you explain how I could look into the stripe-cache settings? This is one of many similar outputs from iostat -x 5 from the initial rebuilding phase: avg-cpu: %user %nice %system %iowait %steal %idle 0.00 0.00 13.29 0.00 0.00 86.71 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sda 6585.60 0.00 4439.20 0.00 44099.20 0.00 19.87 6.14 1.38 1.38 0.00 0.09 39.28 sdb 6280.40 0.00 4746.60 0.00 44108.00 0.00 18.59 5.20 1.10 1.10 0.00 0.07 35.04 sdc 0.00 9895.40 0.00 1120.80 0.00 44152.80 78.79 12.03 10.73 0.00 10.73 0.82 92.32 I also build a RAID6 (with one drive missing): same results.quoted
There must be some misalignment somewhere :(Yes, it's the same behavior.quoted
Do all drives really report as 4K to the OS - physical_block_size, logical_block_size under /sys/block/sdX/queue/ ??No they lie about the block size as you can see also in the fdisk output above.quoted
NB: how does it perform with partitions starting at sector 2048 (check all disks with fdisk -lu /dev/sdX).They perform the same. Any other suggestion? I almost forgot: I've also booted OpenSolaris and I've created a zfs pool (aligned with 4k sector) from the same three drives and they perform very well, individually and together. I know that I'm comparing apples and oranges but ... there must be a solution!-- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
WTF is the jumper for then ? (on 512B drive) Does it change somehow: /sys/block/sdX/queue/physical_block_size /sys/block/sdX/queue/logical_block_size /sys/block/sdX/alignment_offset If osol can handle it (enforcing 4k), it's good sign.. (you used ashift=12 for the pool, right?) Z.