Thread (60 messages) 60 messages, 6 authors, 2018-01-29

Re: [RFC PATCH] blk-mq: fixup RESTART when queue becomes idle

From: Jens Axboe <axboe@kernel.dk>
Date: 2018-01-19 17:09:23
Also in: dm-devel, lkml

On 1/19/18 10:05 AM, Ming Lei wrote:
On Fri, Jan 19, 2018 at 09:52:32AM -0700, Jens Axboe wrote:
quoted
On 1/19/18 9:47 AM, Mike Snitzer wrote:
quoted
On Fri, Jan 19 2018 at 11:41am -0500,
Jens Axboe [off-list ref] wrote:
quoted
On 1/19/18 9:37 AM, Ming Lei wrote:
quoted
On Fri, Jan 19, 2018 at 09:27:46AM -0700, Jens Axboe wrote:
quoted
On 1/19/18 9:26 AM, Ming Lei wrote:
quoted
On Fri, Jan 19, 2018 at 09:19:24AM -0700, Jens Axboe wrote:
There are no pending requests for this case, nothing to restart the
queue. When you fail that blk_get_request(), you are idle, nothing
is pending.
I think we needn't worry about that, once a device is attached to
dm-rq, it can't be mounted any more, and usually user don't use the device
directly and by dm-mpath at the same time.
Even if it doesn't happen for a normal dm setup, it is a case that
needs to be handled. The request allocation is just one example of
a wider scope resource that can be unavailable. If the driver returns
NO_DEV_RESOURCE (or whatever name), it will be a possibility that
the device itself is currently idle.
How would a driver's resources be exhausted yet the device is idle (so
as not to be able to benefit from RESTART)?
I've outlined a number of these examples already. Another case might be:

1) Device is idle
2) Device gets request
3) Device attempts to DMA map
4) DMA map fails because the IOMMU is out of space (nic is using it all)
5) Device returns STS_RESOURCE
6) Queue is marked as needing a restart

All's well, except there is no IO on this device that will notice the
restart bit and retry the operation.

Replace IOMMU failure with any other resource that the driver might need
for an IO, which isn't tied to a device specific resource.
blk_get_request() on dm is an example, as is any allocation failure
occurring in the queue IO path for the driver.
Yeah, if we decide to take the approach of introducing NO_DEV_RESOURCE, all
the current STS_RESOURCE for non-device resource allocation(kmalloc, dma
map, get_request, ...) should be converted to NO_DEV_RESOURCE.

And it is a generic issue, which need generic solution.
Precisely.
Seems running queue after arbitrary in this case is the only way we
thought of, or other solutions?
I think that is the only solution. If it's a frequent enough occurence
to cause performance issues, then it's likely down to a specific
resource shortage, and we can tackle that independently (we need to,
since each of those will need a specialized solution).
If the decision is made, let's make/review the patch, :-)
Let 'er rip.

-- 
Jens Axboe
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help