Re: [PATCH v5 00/16] md/raid5: set STRIPE_SIZE as a configurable value
From: Song Liu <song@kernel.org>
Date: 2020-07-10 16:09:53
On Thu, Jul 9, 2020 at 6:27 AM Yufen Yu [off-list ref] wrote:
On 2020/7/9 7:55, Song Liu wrote:quoted
On Wed, Jul 8, 2020 at 6:15 AM Yufen Yu [off-list ref] wrote:quoted
On 2020/7/3 7:00, Song Liu wrote:quoted
On Thu, Jul 2, 2020 at 5:05 AM Yufen Yu [off-list ref] wrote:quoted
Hi, all For now, STRIPE_SIZE is equal to the value of PAGE_SIZE. That means, RAID5 will issue each bio to disk at least 64KB when PAGE_SIZE is 64KB in arm64. However, filesystem usually issue bio in the unit of 4KB. Then, RAID5 may waste resource of disk bandwidth. To solve the problem, this patchset try to set stripe_size as a configuare value. The default value is 4096. We will add a new sysfs entry and set it by writing a new value, likely: echo 16384 > /sys/block/md1/md/stripe_sizeHigher level question: do we need to support page size that is NOT 4kB times power of 2? Meaning, do we need to support 12kB, 20kB, 24kB, etc. If we only supports, 4kB, 8kB, 16kB, 32kB, etc. some of the logic can be simpler.Yeah, I think we just support 4kb, 8kb, 16kb, 32kb... is enough. But Sorry that I don't know what logic can be simpler in current implementation. I mean it also need to allocate page, and record page offset.I was thinking about replacing multiplication/division with bit operations (shift left/right). But I am not very sure how much that matters in modern ARM CPUs. Would you mind running some benchmarks with this?To test multiplication/division and bit operation, I write a simple test case: $ cat normal.c int page_size = 65536; int stripe_size = 32768; //32KB int main(int argc, char *argv[]) { int i, j, count; int page, offset; if (argc != 2) return -1; count = atol(argv[1]); for (i = 0; i < count; i++) { for (j = 0; j < 4; j++) { page = page_size / stripe_size; offset = j * stripe_size; } } } $ cat shift.c int page_shift = 16; //64KB int stripe_shift = 15; //32KB int main(int argc, char *argv[]) { int i, j, count; int page, offset; if (argc != 2) return -1; count = atol(argv[1]); for (i = 0; i < count; i++) { for (j = 0; j < 4; j++) { page = 1 << (page_shift - stripe_shift); offset = j << stripe_shift; } } } Test them on a arm64 server, the result show there is a minor performance gap between multiplication/division and shift operation. [root@localhost shift]# time ./normal 104857600 real 0m1.199s user 0m1.198s sys 0m0.000s [root@localhost shift]# time ./shift 104857600 real 0m1.166s user 0m1.166s sys 0m0.000s For our implementation, page address and page offset are just computed when allocate stripe_size. After that, we just use the recorded value in sh->dev[i].page and sh->dev[i].offset. So, I think current implementation may not cause much overhead.
Sounds good. Let's keep this part as-is. Thanks, Song