Thread (14 messages) 14 messages, 3 authors, 2012-12-14

Re: [ 02/38] PCI/PM: Fix deadlock when unbinding device if parent in D3cold

From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Date: 2012-11-26 19:30:19
Also in: lkml

On Mon, Nov 26, 2012 at 11:08:18AM -0800, Greg Kroah-Hartman wrote:
On Mon, Nov 26, 2012 at 10:55:05AM -0800, Greg Kroah-Hartman wrote:
quoted
On Fri, Nov 23, 2012 at 11:09:37AM +0800, Huang Ying wrote:
quoted
On Fri, 2012-11-23 at 02:35 +0000, Ben Hutchings wrote:
quoted
On Wed, 2012-11-21 at 16:39 -0800, Greg Kroah-Hartman wrote:
quoted
3.0-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Huang Ying <redacted>

commit 90b5c1d7c45eeb622302680ff96ed30c1a2b6f0e upstream.

If a PCI device and its parents are put into D3cold, unbinding the
device will trigger deadlock as follow:

- driver_unbind
  - device_release_driver
    - device_lock(dev)				<--- previous lock here
    - __device_release_driver
      - pm_runtime_get_sync
        ...
          - rpm_resume(dev)
            - rpm_resume(dev->parent)
              ...
                - pci_pm_runtime_resume
                  ...
                  - pci_set_power_state
                    - __pci_start_power_transition
                      - pci_wakeup_bus(dev->parent->subordinate)
                        - pci_walk_bus
                          - device_lock(dev)	<--- deadlock here


If we do not do device_lock in pci_walk_bus, we can avoid deadlock.
Device_lock in pci_walk_bus is introduced in commit:
d71374dafbba7ec3f67371d3b7e9f6310a588808, corresponding email thread
is: https://lkml.org/lkml/2006/5/26/38.  The patch author Zhang Yanmin
said device_lock is added to pci_walk_bus because:

  Some error handling functions call pci_walk_bus. For example, PCIe
  aer. Here we lock the device, so the driver wouldn't detach from the
  device, as the cb might call driver's callback function.

So I fixed the deadlock as follows:

- remove device_lock from pci_walk_bus
- add device_lock into callback if callback will call driver's callback

I checked pci_walk_bus users one by one, and found only PCIe aer needs
device lock.
[...]

What about eeh_report_error() in
arch/powerpc/platforms/pseries/eeh_driver.c?
En...  Because pci_walk_bus() invocation is removed in 3.7, so this
patch is only valid for 3.7.  We need another version for 3.6.
quoted
Also, is the deadlock even possible before this change in Linux 3.6?

commit 448bd857d48e69b33ef323739dc6d8ca20d4cda7
Author: Huang Ying [off-list ref]
Date:   Sat Jun 23 10:23:51 2012 +0800

    PCI/PM: add PCIe runtime D3cold support
Before this, there will be no deadlock.  So we do not need fixes before
Linux 3.6.
So the first patch mentioned here should not be included in the 3.0 and
3.4 stable kernel releases?
Ok, I've dropped the patch from 3.0 and 3.4-stable trees now.

For 3.6, what should I do?
Ok, for now, I've dropped this patch from 3.6-stable as well.  If you
think it still needs to go there, can you provide something that works?
I see you posted one patch in this thread, but I don't understand if
that is what needs to go upstream in Linus's tree, or if it is for the
3.6-stable tree.  Please clarify.

thanks,

greg k-h
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help