Re: [PATCH] scsi: scsi_host_queue_ready: increase busy count early

[PATCH] scsi: scsi_host_queue_ready: increase busy count early · <hidden> · 2021-01-20
Re: [PATCH] scsi: scsi_host_queue_ready: increase busy count early · John Garry <hidden> · 2021-01-20
Re: [PATCH] scsi: scsi_host_queue_ready: increase busy count early · Donald Buczek <hidden> · 2021-01-21
Re: [PATCH] scsi: scsi_host_queue_ready: increase busy count early · John Garry <hidden> · 2021-01-21
Re: [PATCH] scsi: scsi_host_queue_ready: increase busy count early · Donald Buczek <hidden> · 2021-01-21
Re: [PATCH] scsi: scsi_host_queue_ready: increase busy count early · John Garry <hidden> · 2021-01-21
Re: [PATCH] scsi: scsi_host_queue_ready: increase busy count early · Martin Wilck <hidden> · 2021-01-21
Re: [PATCH] scsi: scsi_host_queue_ready: increase busy count early · Donald Buczek <hidden> · 2021-03-11
RE: [PATCH] scsi: scsi_host_queue_ready: increase busy count early · <Don.Brace@microchip.com> · 2021-02-01
RE: [PATCH] scsi: scsi_host_queue_ready: increase busy count early · <Don.Brace@microchip.com> · 2021-02-02
Re: [PATCH] scsi: scsi_host_queue_ready: increase busy count early · Martin Wilck <hidden> · 2021-02-02
Re: [PATCH] scsi: scsi_host_queue_ready: increase busy count early · John Garry <hidden> · 2021-02-03
Re: [PATCH] scsi: scsi_host_queue_ready: increase busy count early · Paul Menzel <hidden> · 2021-02-03
RE: [PATCH] scsi: scsi_host_queue_ready: increase busy count early · <Don.Brace@microchip.com> · 2021-02-03
RE: [PATCH] scsi: scsi_host_queue_ready: increase busy count early · <Don.Brace@microchip.com> · 2021-02-03
Re: [PATCH] scsi: scsi_host_queue_ready: increase busy count early · John Garry <hidden> · 2021-02-03
RE: [PATCH] scsi: scsi_host_queue_ready: increase busy count early · <Don.Brace@microchip.com> · 2021-02-03
Re: [PATCH] scsi: scsi_host_queue_ready: increase busy count early · Roger Willcocks <hidden> · 2021-02-22
Re: [PATCH] scsi: scsi_host_queue_ready: increase busy count early · John Garry <hidden> · 2021-02-23
Re: [PATCH] scsi: scsi_host_queue_ready: increase busy count early · Roger Willcocks <hidden> · 2021-02-23
Re: [PATCH] scsi: scsi_host_queue_ready: increase busy count early · John Garry <hidden> · 2021-02-23
Re: [PATCH] scsi: scsi_host_queue_ready: increase busy count early · Paul Menzel <hidden> · 2021-03-01
Re: [PATCH] scsi: scsi_host_queue_ready: increase busy count early · Donald Buczek <hidden> · 2021-01-21
Re: [PATCH] scsi: scsi_host_queue_ready: increase busy count early · Martin Wilck <hidden> · 2021-01-21
Re: [PATCH] scsi: scsi_host_queue_ready: increase busy count early · Martin Wilck <hidden> · 2021-01-22
Re: [PATCH] scsi: scsi_host_queue_ready: increase busy count early · Ming Lei <hidden> · 2021-01-22
Re: [PATCH] scsi: scsi_host_queue_ready: increase busy count early · Martin Wilck <hidden> · 2021-01-22

From: Roger Willcocks <hidden>
Date: 2021-02-23 14:07:07

On 23 Feb 2021, at 08:57, John Garry [off-list ref] wrote:

On 22/02/2021 14:23, Roger Willcocks wrote:

quoted

FYI we have exactly this issue on a machine here running CentOS 8.3 (kernel 4.18.0-240.1.1) (so presumably this happens in RHEL 8 too.)
Controller is MSCC / Adaptec 3154-8i16e driving 60 x 12TB HGST drives configured as five x twelve-drive raid-6, software striped using md, and formatted with xfs.
Test software writes to the array using multiple threads in parallel.
The smartpqi driver would report controller offline within ten minutes or so, with status code 0x6100c
Changed the driver to set 'nr_hw_queues = 1’ and then tested by filling the array with random files (which took a couple of days), which completed fine, so it looks like that one-line change fixes it.

That just makes the driver single-queue.

All I can say is it fixes the problem. Write performance is two or three percent faster than CentOS 6.5 on the same hardware.

As such, since the driver uses blk_mq_unique_tag_to_hwq(), only hw queue #0 will ever be used in the driver.

And then, since the driver still spreads MSI-X interrupt vectors over all CPUs [from pci_alloc_vectors(PCI_IRQ_AFFINITY)], if CPUs associated with HW queue #0 are offlined (probably just cpu0), there is no CPUs available to service queue #0 interrupt. That's what I think would happen, from a quick glance at the code.

Surely that would be an issue even if it used multiple queues (one of which would be queue #0) ?

quoted

Would, of course, be helpful if this was back-ported.
—
Roger

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help