Re: hybrid polling on an nvme doesn't seem to work with iodepth > 1 on 5.10.0-rc5
From: Keith Busch <kbusch@kernel.org>
Date: 2020-12-11 03:38:48
On Fri, Dec 11, 2020 at 01:44:38AM +0000, Pavel Begunkov wrote:
On 11/12/2020 01:19, Andres Freund wrote:quoted
On 2020-12-10 23:15:15 +0000, Pavel Begunkov wrote:quoted
On 10/12/2020 23:12, Pavel Begunkov wrote:quoted
On 10/12/2020 20:51, Andres Freund wrote:quoted
Hi, When using hybrid polling (i.e echo 0 > /sys/block/nvme1n1/queue/io_poll_delay) I see stalls with fio when using an iodepth > 1. Sometimes fio hangs, other times the performance is really poor. I reproduced this with SSDs from different vendors.Can you get poll stats from debugfs while running with hybrid? For both iodepth=1 and 32.Even better if for 32 you would show it in dynamic, i.e. cat it several times while running it.Should read all email before responding... This is a loop of grepping for 4k writes (only type I am doing), with 1s interval. I started it before the fio run (after one with iodepth=1). Once the iodepth 32 run finished (--timeout 10, but took 42s0, I started a --iodepth 1 run.Thanks! Your mean grows to more than 30s, so it'll sleep for 15s for each IO. Yep, the sleep time calculation is clearly broken for you. In general the current hybrid polling doesn't work well with high QD, that's because statistics it based on are not very resilient to all sorts of problems. And it might be a problem I described long ago https://www.spinics.net/lists/linux-block/msg61479.html https://lkml.org/lkml/2019/4/30/120
It sounds like the statistic is using the wrong criteria. It ought to use the average time for the next available completion for any request rather than the average latency of a specific IO. It might work at high depth if the hybrid poll knew the hctx's depth when calculating the sleep time, but that information doesn't appear to be readily available.