Re: 2.6.25rc7 lockdep trace
From: Johannes Berg <johannes@sipsolutions.net>
Date: 2008-03-29 10:02:37
[merging your two mails]
The issue is that linkwatch goes: workqueue lock --> RTNL And thus taking workqueue lock within RTNL is deadlock prone no matter where you do it.
Which lock are you referring to here? There is no real runqueue lock other than a spin-lock for the list. I think you may be misunderstanding the lockdep output in a way I didn't anticipated when adding lockdep debugging for workqueue stuff. The actual work function is _not_ running with any locks held! The fact that you see lockdep output there is because we tell lockdep we're holding two locks, namely the "work" and the "workqueue". These are fake, they aren't actually locks. More importantly, those "locks" are locked and unlocked for every _run_ of the struct work function, not around the whole runqueue manipulation, that part is done atomically without global locks. So what you have is essentially list_for_each_work lock(workqueue) lock(struct work) work->f(); unlock(struct work) unlock(workqueue) where both those locks are really only telling lockdep there are locks to get it to warn about the deadlock scenario. Unfortunately Andrew mangled the two patches into one when committing, so I can't point to the changelog, but here are two links to the two patches adding this: http://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.23-rc1/2.6.23-rc1-mm2/broken-out/workqueue-debug-flushing-deadlocks-with-lockdep.patch http://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.23-rc1/2.6.23-rc1-mm2/broken-out/workqueue-debug-work-related-deadlocks-with-lockdep.patch I think the "runqueue lock" you're referring to is the fake "lock(workqueue)" which isn't around the whole runqueue manipulation but only a lockdep helper to debug exactly the deadlock problem we're talking about. I'm open to renaming it, but I don't think lockdep supports formatting its output differently.
I don't see how you can not race with the transition from scheduled to "executing" without taking the runqueue lock for the testing.
When you call cancel_work_sync(), the work struct will be grabbed by the code (really __cancel_work_timer) and removed from the queue. That just operates on bits and a spinlock, not locks held across the struct work function execution, and ensures it is race-free without needing any such locks.
And it is crucial that the workqueue function doesn't execute "accidently" due to such a race before the module and thus the workqueue code is about to get potentially unloaded.
That is exactly what cancel_work_sync() guarantees, but not more. On contrary, flush_workqueue() also guarantees that the work is run one last time if it was scheduled at the time of the flush_workqueue() call. However, as I just tried to explain, cancel_work_sync() _is_ safe to run while holding the RTNL because it doesn't need any runqueue lock. johannes
Attachments
- signature.asc [application/pgp-signature] 828 bytes