Thread (16 messages) 16 messages, 4 authors, 2018-12-14

RE: +AFs-PATCH+AF0- blk-mq: Set request mapping to NULL in blk+AF8-mq+AF8-put+AF8-driver+AF8-tag

From: Kashyap Desai <kashyap.desai@broadcom.com>
Date: 2018-12-04 18:18:52

-----Original Message-----
From: Bart Van Assche [mailto:bvanassche@acm.org]
Sent: Tuesday, December 4, 2018 10:45 PM
To: Kashyap Desai; linux-block; Jens Axboe; Ming Lei; linux-scsi
Cc: Suganath Prabu Subramani; Sreekanth Reddy; Sathya Prakash Veerichetty
Subject: Re: [PATCH] blk-mq: Set request mapping to NULL in
blk_mq_put_driver_tag

On Tue, 2018-12-04 at 22:17 +0530, Kashyap Desai wrote:
quoted
+ Linux-scsi
quoted
quoted
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 9497b47..57432be 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -175,6 +175,7 @@ static inline bool
blk_mq_get_dispatch_budget(struct blk_mq_hw_ctx *hctx)
  static inline void __blk_mq_put_driver_tag(struct blk_mq_hw_ctx
*hctx,
quoted
quoted
quoted
                         struct request *rq)
  {
+    hctx->tags->rqs[rq->tag] = NULL;
      blk_mq_put_tag(hctx, hctx->tags, rq->mq_ctx, rq->tag);
      rq->tag = -1;
No SCSI driver should call scsi_host_find_tag() after a request has
finished. The above patch introduces yet another race and hence can't
be
a proper fix.
Bart, many scsi drivers use scsi_host_find_tag() to traverse max tag_id
to
find out pending IO in firmware.
One of the use case is -  HBA firmware recovery.  In case of firmware
recovery, driver may require to traverse the list and return back
pending
scsi command to SML for retry.
I quickly grep the scsi code and found that snic_scsi, qla4xxx, fnic,
mpt3sas are using API scsi_host_find_tag for the same purpose.

Without this patch, we hit very basic kernel panic due to page fault.
This
is not an issue in non-mq code path. Non-mq path use
blk_map_queue_find_tag() and that particular API does not provide stale
requests.
As I wrote before, your patch doesn't fix the race you described but only
makes the race window smaller.
Hi Bart,

Let me explain the issue. It is not a race, but very straight issue.  Let's
say we have one scsi_device /dev/sda and total IO submitted + completed are
some number 100.
All the 100 IO is *completed*.   Now, As part of Firmware recovery, driver
tries to find our outstanding IOs using scsi_host_find_tag().
Block layer will return all the 100 commands to the driver but really those
100 commands are not outstanding. This patch will return *actual*
outstanding commands.
If scsi_device /dev/sda is not removed in OS, driver accessing scmd of those
100 commands are safe memory access.

Now consider a case where scsi_device /dev/sda is removed and driver
performs firmware recovery. This time driver will crash while accessing scmd
(randomly based on memory reused.).

Along with this patch, low level driver should make sure that all request
queue at block layer is quiesce.

If you want an example of how to use
scsi_host_find_tag() properly, have a look at the SRP initiator driver
(drivers/infiniband/ulp/srp). That driver uses scsi_host_find_tag()
without
triggering any NULL pointer dereferences.
I am not able to find right context from srp, but I check the srp code and
looks like that driver is getting scmd using scsi_host_find_tag() for live
command.
The approach used in that driver
also works when having to support HBA firmware recovery.

Bart.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help