Re: [PATCH V2 1/3] scsi: ufs: Fix error handler clear ua deadlock
From: Adrian Hunter <adrian.hunter@intel.com>
Date: 2021-09-07 11:06:00
On 7/09/21 3:37 am, Bart Van Assche wrote:
On 9/5/21 02:51, Adrian Hunter wrote:quoted
On 3/09/21 11:29 pm, Bart Van Assche wrote:quoted
On 9/3/21 2:56 AM, Adrian Hunter wrote:quoted
There is no guarantee to be able to enter the queue if requests are blocked. That is because freezing the queue will block entry to the queue, but freezing also waits for outstanding requests which can make no progress while the queue is blocked. That situation can happen when the error handler issues requests to clear unit attention condition. The deadlock is very unlikely, so the error handler can be expected to clear ua at some point anyway, so the simple solution is not to wait to enter the queue. Additionally, note that the RPMB queue might be not be entered because it is runtime suspended, but in that case ua will be cleared at RPMB runtime resume.The only ufshcd_clear_ua_wluns() call that I am aware of and that is related to error handling is the call in ufshcd_err_handling_unprepare(). That call happens after ufshcd_scsi_unblock_requests() has been called so how can it be involved in a deadlock?That is a very good question. I went back to reproduce the deadlock again, and it is because, in addition, ufshcd_state is UFSHCD_STATE_EH_SCHEDULED_FATAL. So I have updated the commit message accordingly in V3.quoted
Additionally, the ufshcd_scsi_block_requests() and ufshcd_scsi_unblock_requests() calls can be removed from ufshcd_err_handling_prepare() and ufshcd_err_handling_unprepare(). These calls are no longer necessary since patch "scsi: ufs: Synchronize SCSI and UFS error handling".As has been noted, that commit introduces several new deadlocks - and will presumably cause the deadlock this patches addresses, even if ufshcd_state is not UFSHCD_STATE_EH_SCHEDULED_FATAL. It is perhaps more appropriate to revert "scsi: ufs: Synchronize SCSI and UFS error handling" for v5.15 and try to get things sorted out for v5.16. What do you think?Reverting that patch would be a step backwards because it would make it again possible that the SCSI EH and UFS EH run concurrently and obstruct each other.
I wouldn't say it is a step backwards, just a step forwards the driver is not ready for. For me, the change causes deadlocks so it is a regression. I have never seen SCSI EH cause a problem, but AFAICT it is not needed because the UFS driver's error handler is always scheduled when needed. As a temporary workaround until the driver is ready for SCSI EH, interference between SCSI EH and UFS EH could presumably be avoided by setting eh_strategy_handler to an empty function.
Does the above mean that "if (hba->pm_op_in_progress)" should be removed from the following code in ufshcd_queuecommand()?
case UFSHCD_STATE_EH_SCHEDULED_FATAL:
if (hba->pm_op_in_progress) {
hba->force_reset = true;
set_host_byte(cmd, DID_BAD_TARGET);
cmd->scsi_done(cmd);
goto out;
}It seems to me that removing "if (hba->pm_op_in_progress)" would cause errors for requests that had not in fact even been issued.