Thread (14 messages) 14 messages, 3 authors, 2012-12-14

Re: [ 02/38] PCI/PM: Fix deadlock when unbinding device if parent in D3cold

From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Date: 2012-11-30 02:01:11
Also in: lkml

On Fri, Nov 23, 2012 at 03:47:42PM +0800, Huang Ying wrote:
On Fri, 2012-11-23 at 11:09 +0800, Huang Ying wrote:
quoted
On Fri, 2012-11-23 at 02:35 +0000, Ben Hutchings wrote:
quoted
On Wed, 2012-11-21 at 16:39 -0800, Greg Kroah-Hartman wrote:
quoted
3.0-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Huang Ying <redacted>

commit 90b5c1d7c45eeb622302680ff96ed30c1a2b6f0e upstream.

If a PCI device and its parents are put into D3cold, unbinding the
device will trigger deadlock as follow:

- driver_unbind
  - device_release_driver
    - device_lock(dev)				<--- previous lock here
    - __device_release_driver
      - pm_runtime_get_sync
        ...
          - rpm_resume(dev)
            - rpm_resume(dev->parent)
              ...
                - pci_pm_runtime_resume
                  ...
                  - pci_set_power_state
                    - __pci_start_power_transition
                      - pci_wakeup_bus(dev->parent->subordinate)
                        - pci_walk_bus
                          - device_lock(dev)	<--- deadlock here


If we do not do device_lock in pci_walk_bus, we can avoid deadlock.
Device_lock in pci_walk_bus is introduced in commit:
d71374dafbba7ec3f67371d3b7e9f6310a588808, corresponding email thread
is: https://lkml.org/lkml/2006/5/26/38.  The patch author Zhang Yanmin
said device_lock is added to pci_walk_bus because:

  Some error handling functions call pci_walk_bus. For example, PCIe
  aer. Here we lock the device, so the driver wouldn't detach from the
  device, as the cb might call driver's callback function.

So I fixed the deadlock as follows:

- remove device_lock from pci_walk_bus
- add device_lock into callback if callback will call driver's callback

I checked pci_walk_bus users one by one, and found only PCIe aer needs
device lock.
[...]

What about eeh_report_error() in
arch/powerpc/platforms/pseries/eeh_driver.c?
En...  Because pci_walk_bus() invocation is removed in 3.7, so this
patch is only valid for 3.7.  We need another version for 3.6.
Here is the patch for 3.6.  I have no powerpc machine, so build test
only.

Subject: [BUGFIX] PCI/PM: Fix deadlock when unbind device if its parent in D3cold

If a PCI device and its parents are put into D3cold, unbinding the
device will trigger deadlock as follow:

- driver_unbind
  - device_release_driver
    - device_lock(dev)				<--- previous lock here
    - __device_release_driver
      - pm_runtime_get_sync
        ...
          - rpm_resume(dev)
            - rpm_resume(dev->parent)
              ...
                - pci_pm_runtime_resume
                  ...
                  - pci_set_power_state
                    - __pci_start_power_transition
                      - pci_wakeup_bus(dev->parent->subordinate)
                        - pci_walk_bus
                          - device_lock(dev)	<--- dead lock here


If we do not do device_lock in pci_walk_bus, we can avoid dead lock.
Device_lock in pci_walk_bus is introduced in commit:
d71374dafbba7ec3f67371d3b7e9f6310a588808, corresponding email thread
is: https://lkml.org/lkml/2006/5/26/38.  The patch author Zhang Yanmin
said device_lock is added to pci_walk_bus because:

  Some error handling functions call pci_walk_bus. For example, PCIe
  aer. Here we lock the device, so the driver wouldn't detach from the
  device, as the cb might call driver's callback function.

So I fixed the dead lock as follow:

- remove device_lock from pci_walk_bus
- add device_lock into callback if callback will call driver's callback

I checked pci_walk_bus users one by one, and found only PCIe aer needs
device lock.

Signed-off-by: Huang Ying <redacted>
Cc: Zhang Yanmin <redacted>
---
 arch/powerpc/platforms/pseries/eeh_driver.c |   51 ++++++++++++++++++----------
Due to me applying a power pci patch,
feadf7c0a1a7c08c74bebb4a13b755f8c40e3bbc in Linus's tree to 3.6-stable,
this patch doesn't apply here anymore.

Because that patch is in the tree, is it now just safe to take your
original, unmodified, version of this patch for 3.6-stable?

thanks,

greg k-h
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help