Thread (14 messages) 14 messages, 3 authors, 2020-12-16

Re: hybrid polling on an nvme doesn't seem to work with iodepth > 1 on 5.10.0-rc5

From: Pavel Begunkov <asml.silence@gmail.com>
Date: 2020-12-10 23:20:08

On 10/12/2020 23:12, Pavel Begunkov wrote:
On 10/12/2020 20:51, Andres Freund wrote:
quoted
Hi,

When using hybrid polling (i.e echo 0 >
/sys/block/nvme1n1/queue/io_poll_delay) I see stalls with fio when using
an iodepth > 1. Sometimes fio hangs, other times the performance is
really poor. I reproduced this with SSDs from different vendors.
Can you get poll stats from debugfs while running with hybrid?
For both iodepth=1 and 32.
Even better if for 32 you would show it in dynamic, i.e. cat it several
times while running it.
cat <debugfs>/block/nvme1n1/poll_stat

e.g. if already mounted
cat /sys/kernel/debug/block/nvme1n1/poll_stat
quoted

$ echo -1 | sudo tee /sys/block/nvme1n1/queue/io_poll_delay
$ fio --ioengine io_uring --rw write --filesize 1GB --overwrite=1 --name=test --direct=1 --bs=$((1024*4)) --time_based=1 --runtime=10 --hipri --iodepth 1
93.4k iops

$ fio --ioengine io_uring --rw write --filesize 1GB --overwrite=1 --name=test --direct=1 --bs=$((1024*4)) --time_based=1 --runtime=10 --hipri --iodepth 32
426k iops

$ echo 0 | sudo tee /sys/block/nvme1n1/queue/io_poll_delay
$ fio --ioengine io_uring --rw write --filesize 1GB --overwrite=1 --name=test --direct=1 --bs=$((1024*4)) --time_based=1 --runtime=10 --hipri --iodepth 1
94.3k iops

$ fio --ioengine io_uring --rw write --filesize 1GB --overwrite=1 --name=test --direct=1 --bs=$((1024*4)) --time_based=1 --runtime=10 --hipri --iodepth 32
167 iops
fio took 33s


However, if I ask fio / io_uring to perform all those IOs at once, the performance is pretty decent again (but obviously that's not that desirable)

$ fio --ioengine io_uring --rw write --filesize 1GB --overwrite=1 --name=test --direct=1 --bs=$((1024*4)) --time_based=1 --runtime=10 --hipri --iodepth 32 --iodepth_batch_submit=32 --iodepth_batch_complete_min=32
394k iops


So it looks like there's something wrong around tracking what needs to
be polled for in hybrid mode.
-- 
Pavel Begunkov
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help