Thread (14 messages) 14 messages, 4 authors, 2019-08-20

Re: Kernel 4.14 + has 100 times higher IO latency than Kernel 4.4 with raid1

From: Jinpu Wang <jinpu.wang@cloud.ionos.com>
Date: 2019-08-05 08:34:17
Also in: lkml

On Fri, Aug 2, 2019 at 9:52 PM Paul Menzel [off-list ref] wrote:
Dear Jinpu,


On 02.08.19 16:48, Jinpu Wang wrote:
quoted
We found a problem regarding much higher IO latency when running
kernel 4.4.131 compare to 4.14.133, tried with latest upstream
5.3-rc2, same result.

Reproducer:
1 create md raid1 with 2 ram disks:
sudo mdadm -C /dev/md0 -l1 -n2 -e1.2 --bitmap=internal /dev/ram[0-1]
2 run fio command with rate_iops:
fio  --rw=write --ioengine=libaio --iodepth=32  --size=1000MB
--rate_iops=5000 --direct=1 --numjobs=1 --runtime=20 --time_based
--name=write-test --filename=/dev/md0

result on 4.4 kernel:
root@ib2:~# fio  --rw=write --ioengine=libaio --iodepth=32
--size=1000MB --rate_iops=5000 --direct=1 --numjobs=1 --runtime=20
--time_based --name=write-test --filename=/dev/md0
write-test: (g=0): rw=write, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=32
fio-2.2.10
Starting 1 process
Jobs: 1 (f=1), CR=5000/0 IOPS: [W(1)] [100.0% done] [0KB/20008KB/0KB
/s] [0/5002/0 iops] [eta 00m:00s]
write-test: (groupid=0, jobs=1): err= 0: pid=3351: Fri Aug  2 15:31:26 2019
   write: io=400004KB, bw=19999KB/s, iops=4999, runt= 20001msec
     slat (usec): min=3, max=26, avg= 3.12, stdev= 0.36
     clat (usec): min=0, max=116, avg= 2.04, stdev= 1.33
      lat (usec): min=3, max=141, avg= 5.19, stdev= 1.39
     clat percentiles (usec):
      |  1.00th=[    1],  5.00th=[    2], 10.00th=[    2], 20.00th=[    2],
      | 30.00th=[    2], 40.00th=[    2], 50.00th=[    2], 60.00th=[    2],
      | 70.00th=[    2], 80.00th=[    2], 90.00th=[    2], 95.00th=[    3],
      | 99.00th=[    3], 99.50th=[    3], 99.90th=[    3], 99.95th=[    3],
      | 99.99th=[   86]
     bw (KB  /s): min=20000, max=20008, per=100.00%, avg=20005.54, stdev= 3.74
     lat (usec) : 2=3.37%, 4=96.60%, 10=0.01%, 20=0.01%, 50=0.01%
     lat (usec) : 100=0.01%, 250=0.01%
   cpu          : usr=4.25%, sys=0.00%, ctx=198550, majf=0, minf=11
   IO depths    : 1=100.0%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=0.0%
      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
      issued    : total=r=0/w=100001/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
      latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
   WRITE: io=400004KB, aggrb=19999KB/s, minb=19999KB/s, maxb=19999KB/s,
mint=20001msec, maxt=20001msec

Disk stats (read/write):
     md0: ios=61/99539, merge=0/0, ticks=0/0, in_queue=0, util=0.00%,
aggrios=0/0, aggrmerge=0/0, aggrticks=0/0, aggrin_queue=0,
aggrutil=0.00%

result on 5.3 kernel
root@ib1:/home/jwang# fio  --rw=write --ioengine=libaio --iodepth=32
--size=1000MB --rate_iops=5 --direct=1 --numjobs=1 --runtime=20
--time_based --name=write-test --filename=/dev/md0
write-test: (g=0): rw=write, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=32
fio-2.2.10
Starting 1 process
Jobs: 1 (f=1), CR=5/0 IOPS: [W(1)] [100.0% done] [0KB/20KB/0KB /s]
[0/5/0 iops] [eta 00m:00s]
write-test: (groupid=0, jobs=1): err= 0: pid=1651: Fri Aug  2 17:16:18 2019
   write: io=413696B, bw=20683B/s, iops=5, runt= 20001msec
     slat (usec): min=2, max=51803, avg=1028.62, stdev=7250.96
     clat (usec): min=0, max=91, avg=17.76, stdev=28.07
      lat (usec): min=3, max=51892, avg=1046.50, stdev=7254.89
     clat percentiles (usec):
      |  1.00th=[    0],  5.00th=[    0], 10.00th=[    0], 20.00th=[    0],
      | 30.00th=[    1], 40.00th=[    1], 50.00th=[    1], 60.00th=[    1],
      | 70.00th=[   19], 80.00th=[   44], 90.00th=[   68], 95.00th=[   80],
      | 99.00th=[   88], 99.50th=[   91], 99.90th=[   91], 99.95th=[   91],
      | 99.99th=[   91]
     bw (KB  /s): min=   20, max=   21, per=100.00%, avg=20.04, stdev= 0.21
     lat (usec) : 2=67.33%, 10=0.99%, 20=1.98%, 50=11.88%, 100=17.82%
   cpu          : usr=0.00%, sys=0.00%, ctx=77, majf=0, minf=10
   IO depths    : 1=68.3%, 2=2.0%, 4=4.0%, 8=7.9%, 16=15.8%, 32=2.0%, >=64=0.0%
      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
      issued    : total=r=0/w=101/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
      latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
   WRITE: io=404KB, aggrb=20KB/s, minb=20KB/s, maxb=20KB/s,
mint=20001msec, maxt=20001msec

Disk stats (read/write):
     md0: ios=0/100, merge=0/0, ticks=0/0, in_queue=0, util=0.00%,
aggrios=0/0, aggrmerge=0/0, aggrticks=0/0, aggrin_queue=0,
aggrutil=0.00%
   ram0: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
   ram1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%

During the tests the following kernel parameters are applied:
processor.max_cstate=0 idle=poll mitigations=off

Could anyone give us a hint, what could lead to such a huge difference?
Sorry, I have no idea, but as you can easily reproduce it with RAM
disks, bisecting the commit causing this quite easily.

Your tests should also be added to some test suite (kselftest)?


Kind regards,

Paul
Thanks Paul,

I'm  bisecting now, will report back the result.

Note: the result for 5.3 was done with rate_iops 5 by mistake, but
with 5000 it doesn't change the result.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help