Re: [PATCH V2 1/3] scsi: ufs: Fix error handler clear ua deadlock
From: Bart Van Assche <bvanassche@acm.org>
Date: 2021-09-07 00:38:02
On 9/5/21 02:51, Adrian Hunter wrote:
On 3/09/21 11:29 pm, Bart Van Assche wrote:quoted
On 9/3/21 2:56 AM, Adrian Hunter wrote:quoted
There is no guarantee to be able to enter the queue if requests are blocked. That is because freezing the queue will block entry to the queue, but freezing also waits for outstanding requests which can make no progress while the queue is blocked. That situation can happen when the error handler issues requests to clear unit attention condition. The deadlock is very unlikely, so the error handler can be expected to clear ua at some point anyway, so the simple solution is not to wait to enter the queue. Additionally, note that the RPMB queue might be not be entered because it is runtime suspended, but in that case ua will be cleared at RPMB runtime resume.The only ufshcd_clear_ua_wluns() call that I am aware of and that is related to error handling is the call in ufshcd_err_handling_unprepare(). That call happens after ufshcd_scsi_unblock_requests() has been called so how can it be involved in a deadlock?That is a very good question. I went back to reproduce the deadlock again, and it is because, in addition, ufshcd_state is UFSHCD_STATE_EH_SCHEDULED_FATAL. So I have updated the commit message accordingly in V3.
>
quoted
Additionally, the ufshcd_scsi_block_requests() and ufshcd_scsi_unblock_requests() calls can be removed from ufshcd_err_handling_prepare() and ufshcd_err_handling_unprepare(). These calls are no longer necessary since patch "scsi: ufs: Synchronize SCSI and UFS error handling".As has been noted, that commit introduces several new deadlocks - and will presumably cause the deadlock this patches addresses, even if ufshcd_state is not UFSHCD_STATE_EH_SCHEDULED_FATAL. It is perhaps more appropriate to revert "scsi: ufs: Synchronize SCSI and UFS error handling" for v5.15 and try to get things sorted out for v5.16. What do you think?
Reverting that patch would be a step backwards because it would make it
again possible that the SCSI EH and UFS EH run concurrently and obstruct
each other.
Does the above mean that "if (hba->pm_op_in_progress)" should be removed
from the following code in ufshcd_queuecommand()?
case UFSHCD_STATE_EH_SCHEDULED_FATAL:
if (hba->pm_op_in_progress) {
hba->force_reset = true;
set_host_byte(cmd, DID_BAD_TARGET);
cmd->scsi_done(cmd);
goto out;
}
Thanks,
Bart.