Re: [PATCH -next 0/3] md/raid10: reduce lock contention for io

[PATCH -next 0/3] md/raid10: reduce lock contention for io · Yu Kuai <hidden> · 2022-08-29
[PATCH -next 1/3] md/raid10: fix improper BUG_ON() in raise_barrier() · Yu Kuai <hidden> · 2022-08-29
Re: [PATCH -next 1/3] md/raid10: fix improper BUG_ON() in raise_barrier() · John Stoffel <hidden> · 2022-08-29
Re: [PATCH -next 1/3] md/raid10: fix improper BUG_ON() in raise_barrier() · Yu Kuai <hidden> · 2022-08-30
Re: [PATCH -next 1/3] md/raid10: fix improper BUG_ON() in raise_barrier() · Paul Menzel <hidden> · 2022-08-30
[PATCH -next 3/3] md/raid10: prevent unnecessary calls to wake_up() in fast path · Yu Kuai <hidden> · 2022-08-29
[PATCH -next 2/3] md/raid10: convert resync_lock to use seqlock · Yu Kuai <hidden> · 2022-08-29
Re: [PATCH -next 2/3] md/raid10: convert resync_lock to use seqlock · Logan Gunthorpe <logang@deltatee.com> · 2022-09-01
Re: [PATCH -next 2/3] md/raid10: convert resync_lock to use seqlock · Guoqing Jiang <hidden> · 2022-09-02
Re: [PATCH -next 2/3] md/raid10: convert resync_lock to use seqlock · Logan Gunthorpe <logang@deltatee.com> · 2022-09-02
Re: [PATCH -next 2/3] md/raid10: convert resync_lock to use seqlock · Guoqing Jiang <hidden> · 2022-09-02
Re: [PATCH -next 2/3] md/raid10: convert resync_lock to use seqlock · Yu Kuai <hidden> · 2022-09-02
Re: [PATCH -next 2/3] md/raid10: convert resync_lock to use seqlock · Yu Kuai <hidden> · 2022-09-02
Re: [PATCH -next 2/3] md/raid10: convert resync_lock to use seqlock · Logan Gunthorpe <logang@deltatee.com> · 2022-09-02
Re: [PATCH -next 2/3] md/raid10: convert resync_lock to use seqlock · Yu Kuai <hidden> · 2022-09-03
Re: [PATCH -next 2/3] md/raid10: convert resync_lock to use seqlock · Guoqing Jiang <hidden> · 2022-09-02
Re: [PATCH -next 2/3] md/raid10: convert resync_lock to use seqlock · Yu Kuai <hidden> · 2022-09-02
Re: [PATCH -next 2/3] md/raid10: convert resync_lock to use seqlock · Guoqing Jiang <hidden> · 2022-09-02
Re: [PATCH -next 2/3] md/raid10: convert resync_lock to use seqlock · Yu Kuai <hidden> · 2022-09-02
Re: [PATCH -next 0/3] md/raid10: reduce lock contention for io · Guoqing Jiang <hidden> · 2022-08-29
Re: [PATCH -next 0/3] md/raid10: reduce lock contention for io · Yu Kuai <hidden> · 2022-08-31
Re: [PATCH -next 0/3] md/raid10: reduce lock contention for io · Paul Menzel <hidden> · 2022-08-29
Re: [PATCH -next 0/3] md/raid10: reduce lock contention for io · Yu Kuai <hidden> · 2022-08-30
Re: [PATCH -next 0/3] md/raid10: reduce lock contention for io · Paul Menzel <hidden> · 2022-08-31
Re: [PATCH -next 0/3] md/raid10: reduce lock contention for io · Yu Kuai <hidden> · 2022-08-31
Re: [PATCH -next 0/3] md/raid10: reduce lock contention for io · Song Liu <song@kernel.org> · 2022-08-31
Re: [PATCH -next 0/3] md/raid10: reduce lock contention for io · Yu Kuai <hidden> · 2022-09-03
Re: [PATCH -next 0/3] md/raid10: reduce lock contention for io · Song Liu <song@kernel.org> · 2022-09-09

From: Yu Kuai <hidden>
Date: 2022-08-30 01:09:58
Also in: lkml

Hi, Paul!

在 2022/08/29 21:58, Paul Menzel 写道:

Dear Yu,


Thank you for your patches.

Am 29.08.22 um 15:14 schrieb Yu Kuai:

quoted

From: Yu Kuai <redacted>

patch 1 is a small problem found by code review.
patch 2 avoid holding resync_lock in fast path.
patch 3 avoid holding lock in wake_up() in fast path.

Test environment:

Architecture: aarch64
Cpu: Huawei KUNPENG 920, there are four numa nodes

Raid10 initialize:
mdadm --create /dev/md0 --level 10 --bitmap none --raid-devices 4 
/dev/nvme0n1 /dev/nvme1n1 /dev/nvme2n1 /dev/nvme3n1

Test cmd:
fio -name=0 -ioengine=libaio -direct=1 -group_reporting=1 
-randseed=2022 -rwmixread=70 -refill_buffers -filename=/dev/md0 
-numjobs=16 -runtime=60s -bs=4k -iodepth=256 -rw=randread

Test result:
before this patchset:    2.9 GiB/s
after this patchset:    6.6 Gib/s

Could you please give more details about the test setup, like the drives 
used?

test setup is described above, four nvme disks is used.

Did you use some tools like ftrace to figure out the bottleneck?

Yes, I'm sure the bottleneck is spin_lock(), specifically threads from
multiple nodes try to grab the same lock. By the way, if I bind the
threads to the same node, performance can also improve to 6.6 Gib/s
without this patchset.

Thanks,
Kuai

quoted

Please noted that in kunpeng-920, memory access latency is very bad
accross nodes compare to local node, and in other architecture
performance improvement might not be significant.

Yu Kuai (3):
   md/raid10: fix improper BUG_ON() in raise_barrier()
   md/raid10: convert resync_lock to use seqlock
   md/raid10: prevent unnecessary calls to wake_up() in fast path

  drivers/md/raid10.c | 88 +++++++++++++++++++++++++++++----------------
  drivers/md/raid10.h |  2 +-
  2 files changed, 59 insertions(+), 31 deletions(-)


Kind regards,

Paul
.

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help