Thread (62 messages) 62 messages, 5 authors, 2014-10-13

Re: [PATCH] cpufreq, store_scaling_governor requires policy->rwsem to be held for duration of changing governors [v2]

From: Prarit Bhargava <hidden>
Date: 2014-08-13 19:57:47
Also in: lkml


On 08/05/2014 06:51 PM, Saravana Kannan wrote:
I definitely have a fix for this and the original race you reported. It's
basically reverting that commit you reverted + a fix for the deadlock. That's
the only way to fix the scaling_governor issue.

You fix the deadlock by moving the governor attribute group removing to the
framework code and doing it before STOP+EXIT to governor without holding the
policy lock. And the reverse for INIT+STOP.
I'm still not convinced of the deadlock so I did a bit of additional research
and am pretty close to saying that this is a false positive from the lockdep
code in the kernfs area.

A few things that have caused me concern about the lockdep splat we're seeing:

1.  The splat occurs when we hit __kernfs_remove+0x25b/0x360 which resolves to

        if (kernfs_lockdep(kn)) {
                rwsem_acquire(&kn->dep_map, 0, 0, _RET_IP_);  <<< RIGHT HERE
                if (atomic_read(&kn->active) != KN_DEACTIVATED_BIAS)
                        lock_contended(&kn->dep_map, _RET_IP_);
        }

ie) the *ONLY* way we hit a "deadlock" in this code is if we have LOCKDEP
configured in the kernfs.

It should be noted, that having kernfs_lockdep() always return 0 [1], results in
NO additional lockdep warnings.

Additionally the splat contains

[  107.428421]        CPU0                    CPU1
[  107.433482]        ----                    ----
[  107.438544]   lock(&policy->rwsem);
[  107.442459]                                lock(s_active#98);
[  107.448916]                                lock(&policy->rwsem);
[  107.455650]   lock(s_active#98);

which also points to the situation above (s_active is the default naming used in
the kernfs lockdep code).

In short -- there is no deadlock here.  It only happens in the lockdep code
itself, not because lockdep has identified a real problem.

2.  I then started asking myself why this was occurring.  The reason appears to
be that the attribute for scaling_governor is deleting other sysfs attributes
and that got me to wondering if there were other areas where this occurred and I
remembered it does!  In the USB code writing and reading to the  bConfiguration
of a device may lead to the removal of "down stream" attributes.  The reading
and writing of bConfiguration occurs in
drivers/usb/core/sysfs.c:79


/* configuration value is always present, and r/w */
usb_actconfig_show(bConfigurationValue, "%u\n");

static ssize_t bConfigurationValue_store(struct device *dev,
                                         struct device_attribute *attr,
                                         const char *buf, size_t count)
{
        struct usb_device       *udev = to_usb_device(dev);
        int                     config, value;

        if (sscanf(buf, "%d", &config) != 1 || config < -1 || config > 255)
                return -EINVAL;
        usb_lock_device(udev);
        value = usb_set_configuration(udev, config);
        usb_unlock_device(udev);
        return (value < 0) ? value : count;
}

... and the next lines are IMO important here:

static DEVICE_ATTR_IGNORE_LOCKDEP(bConfigurationValue, S_IRUGO | S_IWUSR,
                bConfigurationValue_show, bConfigurationValue_store);

FWIW, it isn't *exactly* the same ... but commit
356c05d58af05d582e634b54b40050c73609617b explains a similarity between what is
happening with our lockdep splat and the lockdep issues seen in USB.

3.  I came across this from an earlier discussion ...

https://lkml.org/lkml/2010/1/29/306

"We get false positives when the code of a sysfs attribute
synchronously removes other sysfs attributes.  In general that is not
safe due to hotplug etc, but there are specific instances of static
sysfs entries like the pm_core where it appears to be safe."

...


So ... the question that I have is: is this lockdep splat "real"?  It seems to
only occur because we enable the lockdep code in kernfs, that is it occurs as a
side-effect and doesn't appear to be "real" to me.

I only offer this in an effort to keep work to a minimum ;)

P.

[1] It wasn't that simple.  There are some other changes that have to be made.
But you get the idea ...
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help