Thread (19 messages) 19 messages, 2 authors, 2021-09-16

Re: [PATCH V3 1/3] scsi: ufs: Fix error handler clear ua deadlock

From: Adrian Hunter <adrian.hunter@intel.com>
Date: 2021-09-13 08:53:06

On 13/09/21 6:17 am, Bart Van Assche wrote:
On 9/11/21 09:47, Adrian Hunter wrote:
quoted
On 8/09/21 1:36 am, Bart Van Assche wrote:
quoted
--- a/drivers/scsi/ufs/ufshcd.c +++ b/drivers/scsi/ufs/ufshcd.c 
@@ -2707,6 +2707,14 @@ static int ufshcd_queuecommand(struct
Scsi_Host *host, struct scsi_cmnd *cmd) } fallthrough; case
UFSHCD_STATE_RESET: +        /* +         * The SCSI error
handler only starts after all pending commands +         * have
failed or timed out. Complete commands with +         *
DID_IMM_RETRY to allow the error handler to start +         * if
it has been scheduled. +         */ +        set_host_byte(cmd,
DID_IMM_RETRY); +        cmd->scsi_done(cmd);
Setting non-zero return value, in this case "err =
SCSI_MLQUEUE_HOST_BUSY" will anyway cause scsi_dec_host_busy(), so
does this make any difference?
The return value should be changed into 0 since returning
SCSI_MLQUEUE_HOST_BUSY is only allowed if cmd->scsi_done(cmd) has not
yet been called.

I expect that setting the host byte to DID_IMM_RETRY and calling
scsi_done will make a difference, otherwise I wouldn't have suggested
this. As explained in my previous email doing that triggers the SCSI> command completion and resubmission paths. Resubmission only happens
if the SCSI error handler has not yet been scheduled. The SCSI error
handler is scheduled after for all pending commands scsi_done() has
been called or a timeout occurred. In other words, setting the host
byte to DID_IMM_RETRY and calling scsi_done() makes it possible for
the error handler to be scheduled, something that won't happen if
ufshcd_queuecommand() systematically returns SCSI_MLQUEUE_HOST_BUSY.
Not getting it, sorry. :-(

The error handler sets UFSHCD_STATE_RESET and never leaves the state
as UFSHCD_STATE_RESET, so that case does not need to start the error
handler because it is already running.

The error handler is always scheduled after setting 
UFSHCD_STATE_EH_SCHEDULED_FATAL.

scsi_dec_host_busy() is called for any non-zero return value like
SCSI_MLQUEUE_HOST_BUSY:

i.e.
	reason = scsi_dispatch_cmd(cmd);
	if (reason) {
		scsi_set_blocked(cmd, reason);
		ret = BLK_STS_RESOURCE;
		goto out_dec_host_busy;
	}

	return BLK_STS_OK;

out_dec_host_busy:
	scsi_dec_host_busy(shost, cmd);

And that will wake the error handler:

static void scsi_dec_host_busy(struct Scsi_Host *shost, struct scsi_cmnd *cmd)
{
	unsigned long flags;

	rcu_read_lock();
	__clear_bit(SCMD_STATE_INFLIGHT, &cmd->state);
	if (unlikely(scsi_host_in_recovery(shost))) {
		spin_lock_irqsave(shost->host_lock, flags);
		if (shost->host_failed || shost->host_eh_scheduled)
			scsi_eh_wakeup(shost);
		spin_unlock_irqrestore(shost->host_lock, flags);
	}
	rcu_read_unlock();
}

Note that scsi_host_queue_ready() won't let any requests through
when scsi_host_in_recovery(), so the potential problem is with
requests that have already been successfully submitted to the
UFS driver but have not completed. The change you suggest
does not help with that.

That seems like another problem with the patch 
"scsi: ufs: Synchronize SCSI and UFS error handling".

In the latter case the block layer timer is reset over and over
again. See also the blk_mq_start_request() in scsi_queue_rq(). One
could wonder whether this is really what the SCSI core should do if a
SCSI LLD keeps returning the SCSI_MLQUEUE_HOST_BUSY status code ...

Bart.
  
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help