Re: [RFC] bonding: fix workqueue re-arming races
From: Jiri Bohac <hidden>
Date: 2010-09-01 20:55:40
On Wed, Sep 01, 2010 at 01:00:38PM -0700, Jay Vosburgh wrote:
Jiri Bohac [off-list ref] wrote:quoted
I don't think this patch opens new races. The current race scenario is: 1) schedule_delayed_work(foo) 2) foo's timer expires and foo is queued on bond->wq (possibly, foo starts to run and either gets preempted or sleeps on rtnl) 3) bond_close() sets kill_timers=1 and calls cancel_delayed_work() which accomplishes nothing 4) bond_open() sets kill_timers=0 5) bond_open() calls schedule_delayed_work(bar) 6) foo may run the "commit" work that should not be run 7) foo re-arms 8) if (foo == bar) -> BUG /* bond->mode did not change */ With this patch, it is: 1) schedule_delayed_work(foo) 2) foo's timer expires and foo is queued on bond->wq 3) foo may have queued foo_commit on bond->wq_rtnl 4) bond_close() cancels foo 5) bond_open() 6) foo_commit may run and it should not be run The patch avoids the problem of 7) and 8)But the "with patch" #6 is a bigger window; it doesn't require step 5; the foo_commit, et al, can always run after bond_close (because nothing ever cancels the foo_commit except for unregistration). That's the part that makes me nervous.
We can always call cancel_work(foo_commit) in bond_close. This should make the race window the same size it is now. I didn't do that because I was thinking we'd avoid the race somehow completely. Perhaps we can do cancel_work() now and solve it cleanly later.
The current race, as I understand it, requires a "close then open" sequence with little delay between the two.
Yeah, not sure how small the delay has to be. With releasing bond->lock, acquiring rtnl and re-acquiring bond->lock in most of the work items it may be pretty long. Putting an extra check for kill_timers after bond->lock is re-acquired will make the window much smaller ... just in case this is the way we want to "fix" race conditions ;-)
quoted
I think the race in 6) remains the same. It is now easier to fix. This could even be done with a flag (similar to kill_timers), which would be set each time the "commit" work is queued on wq_rtnl and cleared by bond_close(). This should avoid the races completely, I think. The trick is that, unlike kill_timers, bond_open() would not touch this flag.I'm chewing on whether or not it's feasible to introduce some kind of generation count into bond_open/close, so that, e.g., at bond_close, the generation is incremented. Each time any of the work items is queued, the current generation is stashed somewhere private to that work item (in struct bonding, probably). Then, when it runs, it compares the current generation to the stored one. If they don't match, then the work item does nothing.
I thought about the generation count as well before I did this patch. I don't think you can put the counter in struct bonding -- because that would be overwritten with the new value if the work item is re-scheduled by bond_open. I think you would have to create a new dynamic structure on each work schedule and pass it to the work item in the "data" pointer. The structure would contain the counter and the bond pointer. It would be freed by thework item. I did not like this too much.
quoted
[BTW, this is https://bugzilla.novell.com/show_bug.cgi?id=602969 ,Novell BZ account needeed]My BZ account is unworthy to access that bug; can you provide any information as to how they're hitting the problem? Presumably they're doing something that's doing a fast down/up cycle on the bond, but anything else?
They are doing "rcnetwork restart", which will do the close->open. Perhaps all the contention on the rtnl (lots of work with other network interfaces) makes the race window longer. I couldn't reproduce this.
I'm wondering if there's any utility in the "generation count" idea I mention above. It's still a sentinel, but if that can be worked out to reliably stop the work items after close, then maybe it's the least bad option.
Not without the dynamic allocation, I think. How about the "kill_timers" on top of this patch (see my previous e-mail) -- a flag that would be set when queuing the "commit" work and cleared by bond_close()? While this can not stop the re-arming race it is trying to stop now, it should be able to stop the "commit" work items (where it does not matter if you try to queue them on the workqueue for a second time, since it is not a delayed work). -- Jiri Bohac [off-list ref] SUSE Labs, SUSE CZ