Re: [PATCH 08/10] blk-mq-sched: add framework for MQ capable IO schedulers
From: Hannes Reinecke <hare@suse.de>
Date: 2017-01-13 17:43:45
Also in:
lkml
On 01/13/2017 05:41 PM, Omar Sandoval wrote:
On Fri, Jan 13, 2017 at 12:15:17PM +0100, Hannes Reinecke wrote:quoted
On 01/11/2017 10:40 PM, Jens Axboe wrote:quoted
This adds a set of hooks that intercepts the blk-mq path of allocating/inserting/issuing/completing requests, allowing us to develop a scheduler within that framework. We reuse the existing elevator scheduler API on the registration side, but augment that with the scheduler flagging support for the blk-mq interfce, and with a separate set of ops hooks for MQ devices. We split driver and scheduler tags, so we can run the scheduling independent of device queue depth. Signed-off-by: Jens Axboe <axboe@fb.com>[ .. ]quoted
@@ -823,6 +847,35 @@ static inline unsigned int queued_to_index(unsigned int queued) return min(BLK_MQ_MAX_DISPATCH_ORDER - 1, ilog2(queued) + 1); } +static bool blk_mq_get_driver_tag(struct request *rq, + struct blk_mq_hw_ctx **hctx, bool wait) +{ + struct blk_mq_alloc_data data = { + .q = rq->q, + .ctx = rq->mq_ctx, + .hctx = blk_mq_map_queue(rq->q, rq->mq_ctx->cpu), + .flags = wait ? 0 : BLK_MQ_REQ_NOWAIT, + }; + + if (blk_mq_hctx_stopped(data.hctx)) + return false; + + if (rq->tag != -1) { +done: + if (hctx) + *hctx = data.hctx; + return true; + } + + rq->tag = blk_mq_get_tag(&data); + if (rq->tag >= 0) { + data.hctx->tags->rqs[rq->tag] = rq; + goto done; + } + + return false; +} +What happens with the existing request at 'rqs[rq->tag]' ? Surely there is one already, right? Things like '->init_request' assume a fully populated array, so moving one entry to another location is ... interesting. I would have thought we need to do a request cloning here, otherwise this would introduce a memory leak, right? (Not to mention a potential double completion, as the request is now at two positions in the array) Cheers, HannesThe entries in tags->rqs aren't slab objects, they're pointers into pages allocated separately and tracked on tags->page_list. See blk_mq_alloc_rqs(). In blk_mq_free_rqs(), we free all of the pages on tags->page_list, so there shouldn't be a memory leak. As for hctx->tags->rqs, entries are only overwritten when a scheduler is enabled. In that case, the rqs array is storing pointers to requests actually from hctx->sched_tags, so overwriting/leaking isn't an issue.
Ah. Thanks. That explains it. Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage hare@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 N�rnberg GF: J. Hawn, J. Guild, F. Imend�rffer, HRB 16746 (AG N�rnberg)