Re: [PATCH] driver core: fix shutdown races with probe/remove(v2)
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Date: 2012-06-20 22:38:04
Also in:
lkml
On Tue, Jun 19, 2012 at 10:00:36AM +0800, Ming Lei wrote:
On Tue, Jun 19, 2012 at 6:25 AM, Greg Kroah-Hartman [off-list ref] wrote:quoted
Have you read Documentation/stable_kernel_patches.txt? �Please do so andI haven't read Documentation/stable_kernel_patches.txt, but read Documentation/stable_kernel_rules.txt, :-)
Sorry, you are correct :)
quoted
see why I can't take this patch for a stable tree. �Note that no one has ever reported this as a bug before, and the original poster ran away never to be heard from again, so I really don't think it was a real problem that people ever saw.If Documentation/stable_kernel_rules.txt is the correct doc for stable rule, looks reporter requirement isn't listed in the file, but the below can be found: - No "theoretical race condition" issues, unless an explanation of how the race can be exploited is also provided. so I marked it as -stable because I have explained how the race can be exploited in reality.
Ok, but as this has been there since when, 2.5, I'll refrain from marking it this way, as no one has reported a real problem like this before.
quoted
quoted
quoted
quoted
Signed-off-by: Ming Lei <redacted> --- v2: � � � - take Alan's suggestion to use device_trylock to avoid � � � hanging during shutdown by buggy device or driver � � � - hold parent reference counter �drivers/base/core.c | � 32 ++++++++++++++++++++++++++++++++ �1 file changed, 32 insertions(+)diff --git a/drivers/base/core.c b/drivers/base/core.c index 346be8b..f2fc989 100644 --- a/drivers/base/core.c +++ b/drivers/base/core.c@@ -1796,6 +1796,16 @@ out:�} �EXPORT_SYMBOL_GPL(device_move); +static int __try_lock(struct device *dev) +{ + � � int i = 0; + + � � while (!device_trylock(dev) && i++ < 100) + � � � � � � msleep(10); + + � � return i < 100; +}That's a totally arbritary time, why does this work and other times do not? �And what is this returning, if the lock was grabbed successfully?�It is a timeout time and is 1sec now. If the lock can't be held in 1sec, the �function will return 0, otherwise it will return 1 and indicates that the lock �has been held successfully.My point is why 1 second? �That's completly arbitrary and means nothing.1 second is a estimated value, just like many other timeout values in kernel. For example, the timeout passed to usb_control_msg() is mostly estimated first, then may be adjusted later from some new report. Another example is one recent xhci fix commit: 622eb783fe(xHCI: Increase the timeout for controller save/restore state operation) the timeout value is just increased arbitrarily to adapt new device. I still have more many examples in kernel about timeout value...
Yes, I know this, but now you are putting a limit on the amount of time a probe function can take, when before, we have never had one. That's not something to be taken lightly, and is one I know is not true.
quoted
Why not just do a real lock and try for forever?IMO, there are two advantages not just doing a real lock for forever: - avoiding buggy device/driver to hang the system - with trylock, we can log the buggy device so that it is a bit easier to troubleshoot the buggy drivers, suppose the bug is only triggered 1 time in one year or more
No, just fix the driver, I don't want to put a time limit on how long probe can take, as we never have in the past and I'm sure that whatever we pick, will be wrong for someone. I have seen devices that take many seconds, and minutes for some if bad things happen (i.e. the firmware doesn't download properly). Don't break people's working systems. greg k-h