RE: [bug report] shared tags causes IO hang and performance drop

[bug report] shared tags causes IO hang and performance drop · Ming Lei <hidden> · 2021-04-14
Re: [bug report] shared tags causes IO hang and performance drop · John Garry <hidden> · 2021-04-14
Re: [bug report] shared tags causes IO hang and performance drop · Ming Lei <hidden> · 2021-04-14
RE: [bug report] shared tags causes IO hang and performance drop · Kashyap Desai <kashyap.desai@broadcom.com> · 2021-04-14
Re: [bug report] shared tags causes IO hang and performance drop · Ming Lei <hidden> · 2021-04-14
Re: [bug report] shared tags causes IO hang and performance drop · John Garry <hidden> · 2021-04-14
Re: [bug report] shared tags causes IO hang and performance drop · Ming Lei <hidden> · 2021-04-15
Re: [bug report] shared tags causes IO hang and performance drop · John Garry <hidden> · 2021-04-15
Re: [bug report] shared tags causes IO hang and performance drop · Ming Lei <hidden> · 2021-04-15
Re: [bug report] shared tags causes IO hang and performance drop · John Garry <hidden> · 2021-04-15
Re: [bug report] shared tags causes IO hang and performance drop · Ming Lei <hidden> · 2021-04-16
Re: [bug report] shared tags causes IO hang and performance drop · John Garry <hidden> · 2021-04-16
Re: [bug report] shared tags causes IO hang and performance drop · Ming Lei <hidden> · 2021-04-16
Re: [bug report] shared tags causes IO hang and performance drop · John Garry <hidden> · 2021-04-16
Re: [bug report] shared tags causes IO hang and performance drop · Douglas Gilbert <dgilbert@interlog.com> · 2021-04-20
Re: [bug report] shared tags causes IO hang and performance drop · Bart Van Assche <bvanassche@acm.org> · 2021-04-20
Re: [bug report] shared tags causes IO hang and performance drop · Douglas Gilbert <dgilbert@interlog.com> · 2021-04-20
Re: [bug report] shared tags causes IO hang and performance drop · Ming Lei <hidden> · 2021-04-20
Re: [bug report] shared tags causes IO hang and performance drop · Douglas Gilbert <dgilbert@interlog.com> · 2021-04-20
Re: [bug report] shared tags causes IO hang and performance drop · Ming Lei <hidden> · 2021-04-21
Re: [bug report] shared tags causes IO hang and performance drop · John Garry <hidden> · 2021-04-23
Re: [bug report] shared tags causes IO hang and performance drop · John Garry <hidden> · 2021-04-26
Re: [bug report] shared tags causes IO hang and performance drop · Ming Lei <hidden> · 2021-04-26
Re: [bug report] shared tags causes IO hang and performance drop · John Garry <hidden> · 2021-04-26
Re: [bug report] shared tags causes IO hang and performance drop · Ming Lei <hidden> · 2021-04-26
Re: [bug report] shared tags causes IO hang and performance drop · John Garry <hidden> · 2021-04-26
Re: [bug report] shared tags causes IO hang and performance drop · Ming Lei <hidden> · 2021-04-26
Re: [bug report] shared tags causes IO hang and performance drop · John Garry <hidden> · 2021-04-27
Re: [bug report] shared tags causes IO hang and performance drop · Ming Lei <hidden> · 2021-04-27
Re: [bug report] shared tags causes IO hang and performance drop · John Garry <hidden> · 2021-04-27
Re: [bug report] shared tags causes IO hang and performance drop · Ming Lei <hidden> · 2021-04-27
Re: [bug report] shared tags causes IO hang and performance drop · John Garry <hidden> · 2021-04-27
Re: [bug report] shared tags causes IO hang and performance drop · John Garry <hidden> · 2021-07-07
RE: [bug report] shared tags causes IO hang and performance drop · Kashyap Desai <kashyap.desai@broadcom.com> · 2021-04-14
Re: [bug report] shared tags causes IO hang and performance drop · Douglas Gilbert <dgilbert@interlog.com> · 2021-04-14
Re: [bug report] shared tags causes IO hang and performance drop · John Garry <hidden> · 2021-04-14
Re: [bug report] shared tags causes IO hang and performance drop · Douglas Gilbert <dgilbert@interlog.com> · 2021-04-14
Re: [bug report] shared tags causes IO hang and performance drop · Ming Lei <hidden> · 2021-04-15

From: Kashyap Desai <kashyap.desai@broadcom.com>
Date: 2021-04-14 10:42:32
Also in: linux-scsi

Hi Ming,

quoted

It is reported inside RH that CPU utilization is increased ~20% when
running simple FIO test inside VM which disk is built on image stored
on XFS/megaraid_sas.

When I try to investigate by reproducing the issue via scsi_debug, I
found IO hang when running randread IO(8k, direct IO, libaio) on
scsi_debug disk created by the following command:

	modprobe scsi_debug host_max_queue=128

submit_queues=$NR_CPUS

quoted

virtual_gb=256

So I can recreate this hang for using mq-deadline IO sched for scsi debug,
in
that fio does not exit. I'm using v5.12-rc7.

I can also recreate this issue using mq-deadline. Using <none>, there is no
IO hang issue.
Also if I run script to change scheduler periodically (none, mq-deadline),
sysfs entry hangs.

Here is call trace-
Call Trace:
[ 1229.879862]  __schedule+0x29d/0x7a0
[ 1229.879871]  schedule+0x3c/0xa0
[ 1229.879875]  blk_mq_freeze_queue_wait+0x62/0x90
[ 1229.879880]  ? finish_wait+0x80/0x80
[ 1229.879884]  elevator_switch+0x12/0x40
[ 1229.879888]  elv_iosched_store+0x79/0x120
[ 1229.879892]  ? kernfs_fop_write_iter+0xc7/0x1b0
[ 1229.879897]  queue_attr_store+0x42/0x70
[ 1229.879901]  kernfs_fop_write_iter+0x11f/0x1b0
[ 1229.879905]  new_sync_write+0x11f/0x1b0
[ 1229.879912]  vfs_write+0x184/0x250
[ 1229.879915]  ksys_write+0x59/0xd0
[ 1229.879917]  do_syscall_64+0x33/0x40
[ 1229.879922]  entry_SYSCALL_64_after_hwframe+0x44/0xae


I tried both - 5.12.0-rc1 and 5.11.0-rc2+ and there is a same behavior.
Let me also check  megaraid_sas and see if anything generic or this is a
special case of scsi_debug.

Do you have any idea of what changed to cause this, as we would have
tested
this before? Or maybe only none IO sched on scsi_debug. And normally 4k
block size and only rw=read (for me, anyway).

Note that host_max_queue=128 will cap submit queue depth at 128, while
would be 192 by default.

Will check more...including CPU utilization.

Thanks,
John

quoted

Looks it is caused by SCHED_RESTART because current RESTART is just
done on current hctx, and we may need to restart all hctxs for shared
tags, and the issue can be fixed by the append patch. However, IOPS
drops more than 10% with the patch.

So any idea for this issue and the original performance drop?

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c index

e1e997af89a0..45188f7aa789 100644

--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c

@@ -59,10 +59,18 @@

EXPORT_SYMBOL_GPL(blk_mq_sched_mark_restart_hctx);

quoted

  void blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx)
  {
+	bool shared_tag = blk_mq_is_sbitmap_shared(hctx->flags);
+
+	if (shared_tag)
+		blk_mq_run_hw_queues(hctx->queue, true);
+
  	if (!test_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state))
  		return;
  	clear_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state);

+	if (shared_tag)
+		return;
+
  	/*
  	 * Order clearing SCHED_RESTART and list_empty_careful(&hctx-
dispatch)
  	 * in blk_mq_run_hw_queue(). Its pair is the barrier in

Thanks,
Ming

.

Attachments

smime.p7s [application/pkcs7-signature] 4212 bytes

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help