Thread (16 messages) 16 messages, 4 authors, 2012-06-21

Re: [PATCH] driver core: fix shutdown races with probe/remove(v2)

From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Date: 2012-06-20 22:38:04
Also in: lkml

On Tue, Jun 19, 2012 at 10:00:36AM +0800, Ming Lei wrote:
On Tue, Jun 19, 2012 at 6:25 AM, Greg Kroah-Hartman
[off-list ref] wrote:
quoted
Have you read Documentation/stable_kernel_patches.txt? �Please do so and
I haven't read Documentation/stable_kernel_patches.txt, but read
Documentation/stable_kernel_rules.txt, :-)
Sorry, you are correct :)
quoted
see why I can't take this patch for a stable tree. �Note that no one has
ever reported this as a bug before, and the original poster ran away
never to be heard from again, so I really don't think it was a real
problem that people ever saw.
If Documentation/stable_kernel_rules.txt is the correct doc for stable rule,
looks reporter requirement isn't listed in the file, but the below can be found:

          - No "theoretical race condition" issues, unless an explanation of
         how the race can be exploited is also provided.

so I marked it as -stable because I have explained how the race can be
exploited in reality.
Ok, but as this has been there since when, 2.5, I'll refrain from
marking it this way, as no one has reported a real problem like this
before.
quoted
quoted
quoted
quoted
Signed-off-by: Ming Lei <redacted>
---
v2:
� � � - take Alan's suggestion to use device_trylock to avoid
� � � hanging during shutdown by buggy device or driver
� � � - hold parent reference counter

�drivers/base/core.c | � 32 ++++++++++++++++++++++++++++++++
�1 file changed, 32 insertions(+)
diff --git a/drivers/base/core.c b/drivers/base/core.c
index 346be8b..f2fc989 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -1796,6 +1796,16 @@ out:
�}
�EXPORT_SYMBOL_GPL(device_move);

+static int __try_lock(struct device *dev)
+{
+ � � int i = 0;
+
+ � � while (!device_trylock(dev) && i++ < 100)
+ � � � � � � msleep(10);
+
+ � � return i < 100;
+}
That's a totally arbritary time, why does this work and other times do
not? �And what is this returning, if the lock was grabbed successfully?
�It is a timeout time and is 1sec now. If the lock can't be held in 1sec, the
�function will return 0, otherwise it will return 1 and indicates that the lock
�has been held successfully.
My point is why 1 second? �That's completly arbitrary and means nothing.
1 second is a estimated value, just like many other timeout values in kernel.

For example, the timeout passed to usb_control_msg() is mostly estimated
first, then may be adjusted later from some new report.

Another example is one recent xhci fix commit:

622eb783fe(xHCI: Increase the timeout for controller save/restore state
operation)

the timeout value is just increased arbitrarily to adapt new device.

I still have more many examples in kernel about timeout value...
Yes, I know this, but now you are putting a limit on the amount of time
a probe function can take, when before, we have never had one.  That's
not something to be taken lightly, and is one I know is not true.
quoted
Why not just do a real lock and try for forever?
IMO, there are two advantages not just doing a real lock for forever:

- avoiding buggy device/driver to hang the system
- with trylock, we can log the buggy device so that it is a bit
easier to troubleshoot the buggy drivers, suppose the bug is
only triggered 1 time in one year or more
No, just fix the driver, I don't want to put a time limit on how long
probe can take, as we never have in the past and I'm sure that whatever
we pick, will be wrong for someone.

I have seen devices that take many seconds, and minutes for some if bad
things happen (i.e. the firmware doesn't download properly).  Don't
break people's working systems.

greg k-h
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help