Thread (66 messages) 66 messages, 9 authors, 2021-09-02

Re: [PATCH v3 16/18] scsi: ufs: Synchronize SCSI and UFS error handling

From: Bart Van Assche <bvanassche@acm.org>
Date: 2021-08-29 22:18:35

On 8/28/21 02:47, Adrian Hunter wrote:
There is a deadlock that seems to be related to this patch, because now
requests are blocked while the error handler waits on the host_sem.
Hi Adrian,

Some but not all of the issues mentioned below have been introduced by 
patch 16/18. Anyway, thank you for having shared your concerns.
Example:

ufshcd_err_handler() races with ufshcd_wl_suspend() for host_sem.
ufshcd_wl_suspend() wins the race but now PM requests deadlock:

because:
  scsi_queue_rq() -> scsi_host_queue_ready() -> scsi_host_in_recovery() is FALSE

because:
  scsi_schedule_eh() has done:
	    scsi_host_set_state(shost, SHOST_RECOVERY) == 0 ||
	    scsi_host_set_state(shost, SHOST_CANCEL_RECOVERY) == 0)


Some questions for thought:

Won't any holder of host_sem deadlock if it tries to do SCSI requests
and the error handler is waiting on host_sem?

Won't runtime resume deadlock if it is initiated by the error handler?
My understanding is that host_sem is used for the following purposes:
- To prevent that sysfs attributes are read or written after shutdown
   has started (hba->shutting_down).
- To serialize sysfs attribute access, clock scaling, error handling,
   the ufshcd_probe_hba() call from ufshcd_async_scan() and hibernation.

I propose to make the following changes:
- Instead of checking the value of hba->shutting_down from inside sysfs
   attribute callbacks, remove sysfs attributes before starting shutdown.
   That will remove the need to check hba->shutting_down from inside
   sysfs attribute callbacks.
- Leave out the host_sem down() and up() calls from ufshcd_wl_suspend()
   and ufshcd_wl_resume(). Serializing hibernation against e.g. sysfs
   attribute access is not the responsibility of a SCSI LLD - this is the
   responsibility of the power management core.
- Split host_sem. I don't see how else to address the potential deadlock
   between the error handler and runtime resume.

Do you agree with the above?

Thanks,

Bart.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help