Re: [PATCH V2 1/3] scsi: ufs: Fix error handler clear ua deadlock

From: Adrian Hunter <adrian.hunter@intel.com>
Date: 2021-09-07 11:06:00

On 7/09/21 3:37 am, Bart Van Assche wrote:

On 9/5/21 02:51, Adrian Hunter wrote:

quoted

On 3/09/21 11:29 pm, Bart Van Assche wrote:

quoted

On 9/3/21 2:56 AM, Adrian Hunter wrote:

quoted

There is no guarantee to be able to enter the queue if requests
are blocked. That is because freezing the queue will block entry
to the queue, but freezing also waits for outstanding requests
which can make no progress while the queue is blocked.

That situation can happen when the error handler issues requests
to clear unit attention condition. The deadlock is very unlikely,
so the error handler can be expected to clear ua at some point
anyway, so the simple solution is not to wait to enter the
queue.

Additionally, note that the RPMB queue might be not be entered
because it is runtime suspended, but in that case ua will be
cleared at RPMB runtime resume.

The only ufshcd_clear_ua_wluns() call that I am aware of and that
is related to error handling is the call in
ufshcd_err_handling_unprepare(). That call happens after
ufshcd_scsi_unblock_requests() has been called so how can it be
involved in a deadlock?

That is a very good question.  I went back to reproduce the deadlock
again, and it is because, in addition, ufshcd_state is
UFSHCD_STATE_EH_SCHEDULED_FATAL.  So I have updated the commit
message accordingly in V3.

quoted

Additionally, the ufshcd_scsi_block_requests() and
ufshcd_scsi_unblock_requests() calls can be removed from
ufshcd_err_handling_prepare() and ufshcd_err_handling_unprepare().
These calls are no longer necessary since patch "scsi: ufs:
Synchronize SCSI and UFS error handling".

As has been noted, that commit introduces several new deadlocks - and
will presumably cause the deadlock this patches addresses, even if
ufshcd_state is not UFSHCD_STATE_EH_SCHEDULED_FATAL.

It is perhaps more appropriate to revert "scsi: ufs: Synchronize SCSI
and UFS error handling" for v5.15 and try to get things sorted out
for v5.16.  What do you think?

Reverting that patch would be a step backwards because it would make it again possible that the SCSI EH and UFS EH run concurrently and obstruct each other.

I wouldn't say it is a step backwards, just a step forwards the driver is not ready for.

For me, the change causes deadlocks so it is a regression.

I have never seen SCSI EH cause a problem, but AFAICT it is not needed because the UFS driver's error handler is always scheduled when needed.

As a temporary workaround until the driver is ready for SCSI EH, interference between SCSI EH and UFS EH could presumably be avoided by setting eh_strategy_handler to an empty function.

Does the above mean that "if (hba->pm_op_in_progress)" should be removed from the following code in ufshcd_queuecommand()?

    case UFSHCD_STATE_EH_SCHEDULED_FATAL:
        if (hba->pm_op_in_progress) {
            hba->force_reset = true;
            set_host_byte(cmd, DID_BAD_TARGET);
            cmd->scsi_done(cmd);
            goto out;
        }

It seems to me that removing "if (hba->pm_op_in_progress)" would cause errors for requests that had not in fact even been issued.

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help