Re: 2.6.25rc7 lockdep trace

From: Johannes Berg <johannes@sipsolutions.net>
Date: 2008-03-29 10:02:37

[merging your two mails]

The issue is that linkwatch goes: workqueue lock --> RTNL

And thus taking workqueue lock within RTNL is deadlock
prone no matter where you do it.

Which lock are you referring to here? There is no real runqueue lock
other than a spin-lock for the list. I think you may be misunderstanding
the lockdep output in a way I didn't anticipated when adding lockdep
debugging for workqueue stuff.

The actual work function is _not_ running with any locks held! The fact
that you see lockdep output there is because we tell lockdep we're
holding two locks, namely the "work" and the "workqueue". These are
fake, they aren't actually locks.

More importantly, those "locks" are locked and unlocked for every _run_
of the struct work function, not around the whole runqueue manipulation,
that part is done atomically without global locks.

So what you have is essentially
list_for_each_work
	lock(workqueue)
	lock(struct work)
	work->f();
	unlock(struct work)
	unlock(workqueue)

where both those locks are really only telling lockdep there are locks
to get it to warn about the deadlock scenario. Unfortunately Andrew
mangled the two patches into one when committing, so I can't point to
the changelog, but here are two links to the two patches adding this:
http://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.23-rc1/2.6.23-rc1-mm2/broken-out/workqueue-debug-flushing-deadlocks-with-lockdep.patch
http://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.23-rc1/2.6.23-rc1-mm2/broken-out/workqueue-debug-work-related-deadlocks-with-lockdep.patch

I think the "runqueue lock" you're referring to is the fake
"lock(workqueue)" which isn't around the whole runqueue manipulation but
only a lockdep helper to debug exactly the deadlock problem we're
talking about. I'm open to renaming it, but I don't think lockdep
supports formatting its output differently.

I don't see how you can not race with the transition from
scheduled to "executing" without taking the runqueue lock
for the testing.

When you call cancel_work_sync(), the work struct will be grabbed by the
code (really __cancel_work_timer) and removed from the queue. That just
operates on bits and a spinlock, not locks held across the struct work
function execution, and ensures it is race-free without needing any such
locks.

And it is crucial that the workqueue function doesn't
execute "accidently" due to such a race before the module
and thus the workqueue code is about to get potentially
unloaded.

That is exactly what cancel_work_sync() guarantees, but not more. On
contrary, flush_workqueue() also guarantees that the work is run one
last time if it was scheduled at the time of the flush_workqueue() call.

However, as I just tried to explain, cancel_work_sync() _is_ safe to run
while holding the RTNL because it doesn't need any runqueue lock.

johannes

Attachments

signature.asc [application/pgp-signature] 828 bytes

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help